Hundreds of thousands of human whole genome sequencing (WGS) datasets will be generated over the next few years. These data are more valuable in aggregate: joint analysis of genomes from many sources increases sample size and statistical power. A central challenge for joint analysis is that different WGS data processing pipelines cause substantial differences in variant calling in combined datasets, necessitating computationally expensive reprocessing. This approach is no longer tenable given the scale of current studies and data volumes. Here, we define WGS data processing standards that allow different groups to produce functionally equivalent (FE) results, yet still innovate on data processing pipelines. We present initial FE pipelines developed at five genome centers and show that they yield similar variant calling results and produce significantly less variability than sequencing replicates. This work alleviates a key technical bottleneck for genome aggregation and helps lay the foundation for community-wide human genetics studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6168605PMC
http://dx.doi.org/10.1038/s41467-018-06159-4DOI Listing

Publication Analysis

Top Keywords

variant calling
12
data processing
12
genome sequencing
8
human genetics
8
joint analysis
8
wgs data
8
processing pipelines
8
data
5
functional equivalence
4
genome
4

Similar Publications

Carcinogenesis often involves significant alterations in the cancer genome, marked by large structural variants (SVs) and copy number variations (CNVs) that are difficult to capture with short-read sequencing. Traditionally, cytogenetic techniques are applied to detect such aberrations, but they are limited in resolution and do not cover features smaller than several hundred kilobases. Optical genome mapping (OGM) and nanopore sequencing [Oxford Nanopore Technologies (ONT)] bridge this resolution gap and offer enhanced performance for cytogenetic applications.

View Article and Find Full Text PDF

Modifiers and their impact on inherited retinal diseases: a review.

Ophthalmic Genet

January 2025

Department of Small Animal Clinical Sciences, Michigan State University, East Lansing, Michigan, USA.

Background: The phenotypic variability of inherited conditions can be due to several factors including environmental, epigenetic, and genetic. One of those genetic factors is the presence of modifying loci which alter the phenotypic expression of a primary disease or phenotype-causing variant. Modifiers are known to affect penetrance, dominance, expressivity, and pleiotropy of disease.

View Article and Find Full Text PDF

The increasingly widespread application of next-generation sequencing (NGS) in clinical diagnostics and epidemiological research has generated a demand for robust, fast, automated, and user-friendly bioinformatics workflows. To guide the choice of tools for the assembly of full-length viral genomes from NGS datasets, we assessed the performance and applicability of four open-source bioinformatics pipelines (shiver-for which we created a user-friendly Dockerized version, referred to as dshiver; SmaltAlign; viral-ngs; and V-pipe) using both simulated and real-world HIV-1 paired-end short-read datasets and default settings. All four pipelines produced consensus genome assemblies with high quality metrics (genome fraction recovery, mismatch and indel rates, variant calling F1 scores) when the reference sequence used for assembly had high similarity to the analyzed sample.

View Article and Find Full Text PDF

: is a globally emerging pathogen with widespread antimicrobial resistance driven by multiple mechanisms, such as altered expression of efflux pumps like AdeABC, placing it as a priority for research. Driven by the lack of new treatments, alternative approaches are being explored to combat its infections, among which efficacy-enhancing adjuvants can be found. This study presents and characterizes MV6, a synthetic cyclic peptide that boosts aminoglycoside efficacy.

View Article and Find Full Text PDF

Somatic mutations in individual cells lead to genomic mosaicism, contributing to the intricate regulatory landscape of genetic disorders and cancers. To evaluate and refine the detection of somatic mosaicism across different technologies with personalized donor-specific assembly (DSA), we obtained tissue from the dorsolateral prefrontal cortex (DLPFC) of a post-mortem neurotypical 31-year-old individual. We sequenced bulk DLPFC tissue using Oxford Nanopore Technologies (∼60X), NovaSeq (∼30X), and linked-read sequencing (∼28X).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!