SVsearcher: A more accurate structural variation detection method in long read data.

Comput Biol Med

Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China; Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China; Laboratory of Computational Genomics, Li Ka Shing Institute of Health Science, The Chinese University of Hong Kong, Hong Kong, China. Electronic address:

Published: May 2023

Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50×) datasets and more than 25% for low coverage (10×) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2023.106843DOI Listing

Publication Analysis

Top Keywords

long-read sequencing
12
long read
8
multi-allelic svs
8
svsearcher
6
svs
6
svsearcher accurate
4
accurate structural
4
structural variation
4
variation detection
4
detection method
4

Similar Publications

The COVID-19 pandemic has underscored the importance of virus surveillance in public health and wastewater-based epidemiology (WBE) has emerged as a non-invasive, cost-effective method for monitoring SARS-CoV-2 and its variants at the community level. Unfortunately, current variant surveillance methods depend heavily on updated genomic databases with data derived from clinical samples, which can become less sensitive and representative as clinical testing and sequencing efforts decline.In this paper, we introduce HERCULES (High-throughput Epidemiological Reconstruction and Clustering for Uncovering Lineages from Environmental SARS-CoV-2), an unsupervised method that uses long-read sequencing of a single 1 Kb fragment of the Spike gene.

View Article and Find Full Text PDF

Resolving the molecular basis of a Mendelian condition remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion and structural variant calling and diploid de novo genome assembly. This permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility and full-length transcript information in a single long-read sequencing run.

View Article and Find Full Text PDF

Chromosome-level genome assembly and characterization of Kaixuan 016: A high-oleic peanut variety with improved agronomic traits developed through gamma-radiation-assisted breeding.

Genomics

January 2025

Shennong Laboratory/ Henan Academy of Crop Molecular Breeding, Henan Academy of Agricultural Sciences/Henan Provincial Key Laboratory for Oil Crops Improvement, Zhengzhou 450002, China. Electronic address:

High-oleic peanuts are increasingly valued in agricultural production and consumer markets. Nevertheless, limited genomic information hinders the integration of genetic analyses and modern breeding strategies. This study details a chromosome-level genome assembly of Kaixuan 016, a high-oleic peanut variety developed through gamma-radiation-assisted breeding, exhibiting enhanced agronomic traits.

View Article and Find Full Text PDF

Objective: This study aims to improve genetic diagnosis in childhood onset epilepsy with neurodevelopmental problems by utilizing RNA sequencing of fibroblasts to identify pathogenic variants that may be missed by exome sequencing and copy number variation analysis.

Methods: We enrolled 41 individuals with childhood onset epilepsy and neurodevelopmental problems who previously had inconclusive genetic testing. Fibroblast samples were cultured and analyzed using RNA sequencing to detect aberrant expression, aberrant splicing, and monoallelic expression using the Detection of RNA Outlier Pipeline (DROP) pipeline.

View Article and Find Full Text PDF

Telomemore enables single-cell analysis of cell cycle and chromatin condensation.

Nucleic Acids Res

January 2025

Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå University, Biomedicinbyggnaden 6K och 6L, Umeå universitetssjukhus, 901 87, Umeå, Sweden.

Single-cell RNA-seq methods can be used to delineate cell types and states at unprecedented resolution but do little to explain why certain genes are expressed. Single-cell ATAC-seq and multiome (ATAC + RNA) have emerged to give a complementary view of the cell state. It is however unclear what additional information can be extracted from ATAC-seq data besides transcription factor binding sites.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!