Background: High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling.
Results: Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants.
Conclusions: Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11077792 | PMC |
http://dx.doi.org/10.1186/s12859-024-05793-8 | DOI Listing |
Anim Genet
February 2025
College of Animal Science and Technology, Southwest University, Chongqing, China.
Goats typically have double coats, with the outermost coarse hairs providing protection against mechanical and radiation damage. While much attention has been paid to cashmere due to its status as a high-end textile material, there is limited information available on coarse hair. This study aimed to identify genomic variants, such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels), associated with coarse hair diameter using a genome-wide association study (GWAS).
View Article and Find Full Text PDFMethods
January 2025
Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada. Electronic address:
This paper proposes a detailed process for SV calling that permits a data-driven assessment of multiple SV callers that uses both genome assemblies and long-reads. The process is implemented as a software pipeline named Structural Variant - Jaccard Index Measure, or SVJIM, using the Snakemake [20] workflow management system. Like most state-of-the-art SV callers, SV-JIM detects the presence of variations between pairs of genomes, but it streamlines the numerous SV calling stages into a single process for user convenience and evaluates the multiple SV sets produced using the Jaccard index measure to identify those with the highest consistency among the included SV callers.
View Article and Find Full Text PDFMol Biol Rep
January 2025
Department of Zoology, The University of Burdwan, Bardhaman, West Bengal, 713104, India.
Background: This study aimed to develop and validate a targeted next-generation sequencing (NGS) panel along with a data analysis algorithm capable of detecting single-nucleotide variants (SNVs) and copy number variations (CNVs) within the beta-globin gene cluster. The aim was to reduce the turnaround time in conventional genotyping methods and provide a rapid and comprehensive solution for prenatal diagnosis, carrier screening, and genotyping of β-thalassemia patients.
Methods And Results: We devised a targeted NGS panel spanning an 80.
STAR Protoc
January 2025
Division of Hematology, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Ludwig Center at Harvard, Harvard Medical School, Boston, MA, USA. Electronic address:
Single-cell RNA sequencing (scRNA-seq) enables detailed characterization of cell states but often lacks insights into tissue clonal structures. Here, we present a protocol to probe cell states and clonal information simultaneously by enriching mitochondrial DNA (mtDNA) variants from 3'-barcoded full-length cDNA. We describe steps for input library preparation, mtDNA enrichment, PCR product cleanup, and paired-end sequencing.
View Article and Find Full Text PDFBMC Bioinformatics
January 2025
Auburn University, Auburn, AL, 36849, USA.
Background: Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10 + kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!