Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888318PMC
http://dx.doi.org/10.7717/peerj.2074DOI Listing

Publication Analysis

Top Keywords

sequence analysis
8
sequence
8
input dna
8
dna molecules
8
read group
8
group representing
8
variant callers
8
sequence variants
8
dna
7
deepsnvminer
5

Similar Publications

Multi-channel spatio-temporal graph attention contrastive network for brain disease diagnosis.

Neuroimage

January 2025

College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China; Key Laboratory of Brain-Machine Intelligence Technology, Ministry of Education, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China. Electronic address:

Dynamic brain networks (DBNs) can capture the intricate connections and temporal evolution among brain regions, becoming increasingly crucial in the diagnosis of neurological disorders. However, most existing researches tend to focus on isolated brain network sequence segmented by sliding windows, and they are difficult to effectively uncover the higher-order spatio-temporal topological pattern in DBNs. Meantime, it remains a challenge to utilize the structure connectivity prior in the DBNs analysis.

View Article and Find Full Text PDF

Improvement of the accuracy of breeding value prediction for egg production traits in Muscovy duck using low-coverage whole-genome sequence data.

Poult Sci

January 2025

Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, China. Electronic address:

Low-coverage whole genome sequencing (lcWGS) is an effective low-cost genotyping technology when combined with genotype imputation approaches. It facilitates cost-effective genomic selection (GS) programs in agricultural animal populations. GS based on lcWGS data has been successfully applied to livestock such as pigs and donkeys.

View Article and Find Full Text PDF

Protocol for mitochondrial variant enrichment from single-cell RNA sequencing using MAESTER.

STAR Protoc

January 2025

Division of Hematology, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Ludwig Center at Harvard, Harvard Medical School, Boston, MA, USA. Electronic address:

Single-cell RNA sequencing (scRNA-seq) enables detailed characterization of cell states but often lacks insights into tissue clonal structures. Here, we present a protocol to probe cell states and clonal information simultaneously by enriching mitochondrial DNA (mtDNA) variants from 3'-barcoded full-length cDNA. We describe steps for input library preparation, mtDNA enrichment, PCR product cleanup, and paired-end sequencing.

View Article and Find Full Text PDF

Probing the functional constraints of influenza A virus NEP by deep mutational scanning.

Cell Rep

January 2025

Department of Biochemistry, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carl R. Woese Institute for Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA; Carle Illinois College of Medicine, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA. Electronic address:

The influenza A virus nuclear export protein (NEP) is a multifunctional protein that is essential for the viral life cycle and has very high sequence conservation. However, since the open reading frame of NEP largely overlaps with that of another influenza viral protein, non-structural protein 1, it is difficult to infer the functional constraints of NEP based on sequence conservation analysis. In addition, the N-terminal of NEP is structurally disordered, which further complicates the understanding of its function.

View Article and Find Full Text PDF

is a putative producer of polyunsaturated fatty acids in the gut soil of the composting earthworm .

Appl Environ Microbiol

January 2025

Centre for Microbiology and Environmental Systems Science, Division of Microbial Ecology, University of Vienna, Vienna, Austria.

Polyunsaturated fatty acids (PUFAs) play a crucial role in aiding bacteria to adapt to extreme and stressful environments. While there is a well-established understanding of their production, accrual, and transfer within marine ecosystems, knowledge about terrestrial environments remains limited. Investigation of the intestinal microbiome of earthworms has illuminated the presence of PUFAs presumably of microbial origin, which contrasts with the surrounding soil.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!