Establishment of an eHAP1 human haploid cell line hybrid reference genome assembled from short and long reads.

Genomics

McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA. Electronic address:

Published: May 2020

Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10298834PMC
http://dx.doi.org/10.1016/j.ygeno.2020.01.009DOI Listing

Publication Analysis

Top Keywords

reference genome
12
short reads
12
haploid cell
8
long reads
8
long short
8
high-quality reference
8
cell
5
genome
5
long
5
reads
5

Similar Publications

GDBr: genomic signature interpretation tool for DNA double-strand break repair mechanisms.

Nucleic Acids Res

January 2025

Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea.

Large genetic variants can be generated via homologous recombination (HR), such as polymerase theta-mediated end joining (TMEJ) or single-strand annealing (SSA). Given that these HR-based mechanisms leave specific genomic signatures, we developed GDBr, a genomic signature interpretation tool for DNA double-strand break repair mechanisms using high-quality genome assemblies. We applied GDBr to a draft human pangenome reference.

View Article and Find Full Text PDF

Aims: This study investigated the association between maternal age and early and late gestational diabetes mellitus (GDM).

Methods: In total, 72,270 pregnant women were included in this prospective birth cohort study. Associations between maternal age and early GDM (diagnosed at <24 gestational weeks) and late GDM (diagnosed at ≥24 gestational weeks) were evaluated using a multinomial logistic regression model with possible confounding factors.

View Article and Find Full Text PDF
Article Synopsis
  • Dermatofibrosarcoma protuberans (DFSP) is a rare type of low-grade cancer that can be difficult to diagnose.
  • It often looks similar to benign skin lesions, which can lead to confusion in diagnosis.
  • Prompt recognition and differentiation from other skin conditions are important for effective treatment.
View Article and Find Full Text PDF

A pseudogene is a non-functional copy of a protein-coding gene. Processed pseudogenes, which are created by the reverse transcription of mRNA and subsequent integration of the resulting cDNA into the genome, being a major pseudogene class, represent a significant challenge in genome analysis due to their high sequence similarity to the parent genes and their frequent absence in the reference genome. This homology can lead to errors in variant identification, as sequences derived from processed pseudogenes can be incorrectly assigned to parental genes, complicating correct variant calling.

View Article and Find Full Text PDF

(Fragile X messenger ribonucleoprotein 1), located on the X-chromosome, encodes the multi-functional FMR1 protein (FMRP), critical to brain development and function. Trinucleotide CGG repeat expansions at this locus cause a range of neurological disorders, collectively referred to as Fragile X-related conditions. The most well-known of these is Fragile X syndrome, a neurodevelopmental disorder associated with syndromic facial features, autism, intellectual disabilities, and seizures.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!