Identification of copy number variants in whole-genome data using Reference Coverage Profiles.

Gustavo Glusman Alissa Severson Varsha Dhankani Max Robinson Terry Farrah Denise E Mauldin Anna B Stittrich Seth A Ament Jared C Roach Mary E Brunkow Dale L Bodian Joseph G Vockley Ilya Shmulevich John E Niederhuber Leroy Hood

Front Genet

Institute for Systems Biology Seattle, WA, USA.

Published: March 2015

The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330915	PMC
http://dx.doi.org/10.3389/fgene.2015.00045	DOI Listing

Publication Analysis

Top Keywords

depth coverage

coverage profiles

copy number

number variants

variants whole-genome

coverage

reference coverage

individual genomes

individual genome

profiles

Similar Publications

Anchorage Accurately Assembles Anchor-Flanked Synthetic Long Reads.

Lebniz Int Proc Inform

August 2024

Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.

Xiaofei Carl Zang Xiang Li Kyle Metcalfe Tuval Ben-Yehezkel Ryan Kelley

Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol.

View Article and Find Full Text PDF

Similar Publications

Isobaric Tagging and Data Independent Acquisition as Complementary Strategies for Proteome Profiling on an Orbitrap Astral Mass Spectrometer.

bioRxiv

December 2024

Xinyue Liu Shane L Dawson Steven P Gygi Joao A Paulo

Comprehensive global proteome profiling that is amenable to high throughput processing will broaden our understanding of complex biological systems. Here, we evaluated two leading mass spectrometry techniques, Data Independent Acquisition (DIA) and Tandem Mass Tagging (TMT), for extensive protein abundance profiling. DIA provides label-free quantification with a broad dynamic range, while TMT enables multiplexed analysis using isobaric tags for efficient cross-sample comparisons.

View Article and Find Full Text PDF

Similar Publications

Long-read sequencing of hundreds of diverse brains provides insight into the impact of structural variation on gene expression and DNA methylation.

bioRxiv

December 2024

Kimberley J Billingsley Melissa Meredith Kensuke Daida Pilar Alvarez Jerez Shloka Negi

Structural variants (SVs) drive gene expression in the human brain and are causative of many neurological conditions. However, most existing genetic studies have been based on short-read sequencing methods, which capture fewer than half of the SVs present in any one individual. Long-read sequencing (LRS) enhances our ability to detect disease-associated and functionally relevant structural variants (SVs); however, its application in large-scale genomic studies has been limited by challenges in sample preparation and high costs.

View Article and Find Full Text PDF

Similar Publications

Expanding the Landscape of Aging via Orbitrap Astral Mass Spectrometry and Tandem Mass Tag (TMT) Integration.

bioRxiv

December 2024

Gregory R Keele Yue Dou Seth P Kodikara Erin D Jeffery Dina Bai

Aging results in a progressive decline in physiological function due to the deterioration of essential biological processes, such as transcription and RNA splicing, ultimately increasing mortality risk. Although proteomics is emerging as a powerful tool for elucidating the molecular mechanisms of aging, existing studies are constrained by limited proteome coverage and only observe a narrow range of lifespan. To overcome these limitations, we integrated the Orbitrap Astral Mass Spectrometer with the multiplex tandem mass tag (TMT) technology to profile the proteomes of three brain tissues (cortex, hippocampus, striatum) and kidney in the C57BL/6JN mouse model, achieving quantification of 8,954 to 9,376 proteins per tissue (cumulatively 12,749 across all tissues).

View Article and Find Full Text PDF

Similar Publications

Transforming precision medicine: The potential of the clinical artificial intelligent single-cell framework.

Clin Transl Med

January 2025

Department of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria.

Christian Baumgartner Dagmar Brislinger

The editorial, "Clinical and translational mode of single-cell measurements: An artificial intelligent single-cell," introduces the innovative clinical artificial intelligence single-cell (caiSC) system, which merges AI with single-cell informatics to advance real-time diagnostics, disease monitoring, and treatment prediction. By combining clinical data and multimodal molecular inputs, caiSC facilitates personalized medicine, promising enhanced diagnostic precision and tailored therapeutic approaches. Despite its potential, caiSC lacks comprehensive data coverage across cell types and diseases, presenting challenges in data quality and model robustness.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!