Efficient hybrid de novo assembly of human genomes with WENGAN.

Nat Biotechnol

Inria Grenoble Rhône-Alpes, Montbonnot, France.

Published: April 2021

AI Article Synopsis

  • - The text discusses the challenges in generating precise genome assemblies for complex human genomes using only long, error-prone sequencing reads, which often leads to a combination of long and accurate short reads for better results.
  • - It introduces WENGAN, a new hybrid assembly algorithm that efficiently combines data from various sequencing technologies, leading to high-quality genome assemblies with minimal computational costs.
  • - WENGAN has been successfully applied to assemble four human genomes, achieving impressive metrics like high contig NG50 values (up to 80.64 Mb), fewer assembly errors, and excellent gene completeness, outperforming the existing human reference genome.

Article Abstract

Generating accurate genome assemblies of large, repeat-rich human genomes has proved difficult using only long, error-prone reads, and most human genomes assembled from long reads add accurate short reads to polish the consensus sequence. Here we report an algorithm for hybrid assembly, WENGAN, that provides very high quality at low computational cost. We demonstrate de novo assembly of four human genomes using a combination of sequencing data generated on ONT PromethION, PacBio Sequel, Illumina and MGI technology. WENGAN implements efficient algorithms to improve assembly contiguity as well as consensus quality. The resulting genome assemblies have high contiguity (contig NG50: 17.24-80.64 Mb), few assembly errors (contig NGA50: 11.8-59.59 Mb), good consensus quality (QV: 27.84-42.88) and high gene completeness (BUSCO complete: 94.6-95.2%), while consuming low computational resources (CPU hours: 187-1,200). In particular, the WENGAN assembly of the haploid CHM13 sample achieved a contig NG50 of 80.64 Mb (NGA50: 59.59 Mb), which surpasses the contiguity of the current human reference genome (GRCh38 contig NG50: 57.88 Mb).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041623PMC
http://dx.doi.org/10.1038/s41587-020-00747-wDOI Listing

Publication Analysis

Top Keywords

human genomes
16
contig ng50
12
novo assembly
8
assembly human
8
genome assemblies
8
low computational
8
consensus quality
8
assembly
6
human
5
efficient hybrid
4

Similar Publications

Draft Genome of Naganishia uzbekistanensis from a Clinical Pulmonary Infection.

Mycopathologia

January 2025

Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100730, China.

This study presents the first high-quality assembled genome of Naganishia uzbekistanensis, derived from a clinical isolate CY11558 obtained from a patient with a postoperative pulmonary infection. This work provides an improved reference assembly for downstream research and diagnosis of infections caused by this species.

View Article and Find Full Text PDF

Multiplexed spatial mapping of chromatin features, transcriptome and proteins in tissues.

Nat Methods

January 2025

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

The phenotypic and functional states of cells are modulated by a complex interactive molecular hierarchy of multiple omics layers, involving the genome, epigenome, transcriptome, proteome and metabolome. Spatial omics approaches have enabled the study of these layers in tissue context but are often limited to one or two modalities, offering an incomplete view of cellular identity. Here we present spatial-Mux-seq, a multimodal spatial technology that allows simultaneous profiling of five different modalities: two histone modifications, chromatin accessibility, whole transcriptome and a panel of proteins at tissue scale and cellular level in a spatially resolved manner.

View Article and Find Full Text PDF

A genome-wide atlas of human cell morphology.

Nat Methods

January 2025

Broad Institute of MIT and Harvard, Cambridge, MA, USA.

A key challenge of the modern genomics era is developing empirical data-driven representations of gene function. Here we present the first unbiased morphology-based genome-wide perturbation atlas in human cells, containing three genome-wide genotype-phenotype maps comprising CRISPR-Cas9-based knockouts of >20,000 genes in >30 million cells. Our optical pooled cell profiling platform (PERISCOPE) combines a destainable high-dimensional phenotyping panel (based on Cell Painting) with optical sequencing of molecular barcodes and a scalable open-source analysis pipeline to facilitate massively parallel screening of pooled perturbation libraries.

View Article and Find Full Text PDF

Crohn's disease (CD) is a chronic inflammatory bowel disease with an unknown etiology. Ubiquitination plays a significant role in the pathogenesis of CD. This study aimed to explore the functional roles of ubiquitination-related genes in CD.

View Article and Find Full Text PDF

MAI-TargetFisher: A proteome-wide drug target prediction method synergetically enhanced by artificial intelligence and physical modeling.

Acta Pharmacol Sin

January 2025

Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.

Computational target identification plays a pivotal role in the drug development process. With the significant advancements of deep learning methods for protein structure prediction, the structural coverage of human proteome has increased substantially. This progress inspired the development of the first genome-wide small molecule targets scanning method.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!