The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330915 | PMC |
http://dx.doi.org/10.3389/fgene.2015.00045 | DOI Listing |
Lebniz Int Proc Inform
August 2024
Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
Modern sequencing technologies allow for the addition of short-sequence tags, known as anchors, to both ends of a captured molecule. Anchors are useful in assembling the full-length sequence of a captured molecule as they can be used to accurately determine the endpoints. One representative of such anchor-enabled technology is LoopSeq Solo, a synthetic long read (SLR) sequencing protocol.
View Article and Find Full Text PDFComprehensive global proteome profiling that is amenable to high throughput processing will broaden our understanding of complex biological systems. Here, we evaluated two leading mass spectrometry techniques, Data Independent Acquisition (DIA) and Tandem Mass Tagging (TMT), for extensive protein abundance profiling. DIA provides label-free quantification with a broad dynamic range, while TMT enables multiplexed analysis using isobaric tags for efficient cross-sample comparisons.
View Article and Find Full Text PDFStructural variants (SVs) drive gene expression in the human brain and are causative of many neurological conditions. However, most existing genetic studies have been based on short-read sequencing methods, which capture fewer than half of the SVs present in any one individual. Long-read sequencing (LRS) enhances our ability to detect disease-associated and functionally relevant structural variants (SVs); however, its application in large-scale genomic studies has been limited by challenges in sample preparation and high costs.
View Article and Find Full Text PDFAging results in a progressive decline in physiological function due to the deterioration of essential biological processes, such as transcription and RNA splicing, ultimately increasing mortality risk. Although proteomics is emerging as a powerful tool for elucidating the molecular mechanisms of aging, existing studies are constrained by limited proteome coverage and only observe a narrow range of lifespan. To overcome these limitations, we integrated the Orbitrap Astral Mass Spectrometer with the multiplex tandem mass tag (TMT) technology to profile the proteomes of three brain tissues (cortex, hippocampus, striatum) and kidney in the C57BL/6JN mouse model, achieving quantification of 8,954 to 9,376 proteins per tissue (cumulatively 12,749 across all tissues).
View Article and Find Full Text PDFClin Transl Med
January 2025
Department of Cell Biology, Histology and Embryology, Gottfried Schatz Research Center, Medical University of Graz, Graz, Austria.
The editorial, "Clinical and translational mode of single-cell measurements: An artificial intelligent single-cell," introduces the innovative clinical artificial intelligence single-cell (caiSC) system, which merges AI with single-cell informatics to advance real-time diagnostics, disease monitoring, and treatment prediction. By combining clinical data and multimodal molecular inputs, caiSC facilitates personalized medicine, promising enhanced diagnostic precision and tailored therapeutic approaches. Despite its potential, caiSC lacks comprehensive data coverage across cell types and diseases, presenting challenges in data quality and model robustness.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!