Establishment of an eHAP1 human haploid cell line hybrid reference genome assembled from short and long reads.

William D Law René L Warren Andrew S McCallion

Genomics

McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Molecular and Comparative Pathobiology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA. Electronic address:

Published: May 2020

Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies. We generated ~15× coverage Nanopore long reads from ten GridION flowcells and utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions. By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10298834	PMC
http://dx.doi.org/10.1016/j.ygeno.2020.01.009	DOI Listing

Publication Analysis

Top Keywords

reference genome

short reads

haploid cell

long reads

long short

high-quality reference

cell

genome

long

reads

Similar Publications

GDBr: genomic signature interpretation tool for DNA double-strand break repair mechanisms.

Nucleic Acids Res

January 2025

Department of Convergent Bioscience and Informatics, College of Bioscience and Biotechnology, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon 34134, Republic of Korea.

Hyunwoo Ryu Hyunho Han Chuna Kim Jun Kim

Large genetic variants can be generated via homologous recombination (HR), such as polymerase theta-mediated end joining (TMEJ) or single-strand annealing (SSA). Given that these HR-based mechanisms leave specific genomic signatures, we developed GDBr, a genomic signature interpretation tool for DNA double-strand break repair mechanisms using high-quality genome assemblies. We applied GDBr to a draft human pangenome reference.

View Article and Find Full Text PDF

Similar Publications

Advanced maternal age is a risk factor for both early and late gestational diabetes mellitus: The Japan Environment and Children's Study.

J Diabetes Investig

January 2025

Department of Obstetrics and Gynecology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan.

Kazuma Tagami Noriyuki Iwama Hirotaka Hamada Hasumi Tomita Rie Kudo

Aims: This study investigated the association between maternal age and early and late gestational diabetes mellitus (GDM).

Methods: In total, 72,270 pregnant women were included in this prospective birth cohort study. Associations between maternal age and early GDM (diagnosed at <24 gestational weeks) and late GDM (diagnosed at ≥24 gestational weeks) were evaluated using a multinomial logistic regression model with possible confounding factors.

View Article and Find Full Text PDF

Similar Publications

Congenital Dermatofibrosarcoma Protuberans-An Update on the Ongoing Diagnostic Challenges.

Cancers (Basel)

January 2025

Unit of Dermatology, Department of Medicine, University of Padova, 35122 Padua, Italy.

Fortunato Cassalia Andrea Danese Enrico Cocchi Silvia Vaienti Anna Bolzon

Article Synopsis

Dermatofibrosarcoma protuberans (DFSP) is a rare type of low-grade cancer that can be difficult to diagnose.
It often looks similar to benign skin lesions, which can lead to confusion in diagnosis.
Prompt recognition and differentiation from other skin conditions are important for effective treatment.

View Article and Find Full Text PDF

Similar Publications

Quantitative Analysis of Pseudogene-Associated Errors During Germline Variant Calling.

Int J Mol Sci

January 2025

Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, 125315 Moscow, Russia.

Artem Podvalnyi Arina Kopernik Mariia Sayganova Mary Woroncow Gauhar Zobkova

A pseudogene is a non-functional copy of a protein-coding gene. Processed pseudogenes, which are created by the reverse transcription of mRNA and subsequent integration of the resulting cDNA into the genome, being a major pseudogene class, represent a significant challenge in genome analysis due to their high sequence similarity to the parent genes and their frequent absence in the reference genome. This homology can lead to errors in variant identification, as sequences derived from processed pseudogenes can be incorrectly assigned to parental genes, complicating correct variant calling.

View Article and Find Full Text PDF

Similar Publications

Beyond the Synapse: and FMRP Molecular Mechanisms in the Nucleus.

Int J Mol Sci

December 2024

Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.

Nicole Hansen Anna Dischler Caroline Dias

(Fragile X messenger ribonucleoprotein 1), located on the X-chromosome, encodes the multi-functional FMR1 protein (FMRP), critical to brain development and function. Trinucleotide CGG repeat expansions at this locus cause a range of neurological disorders, collectively referred to as Fragile X-related conditions. The most well-known of these is Fragile X syndrome, a neurodevelopmental disorder associated with syndromic facial features, autism, intellectual disabilities, and seizures.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!