An integrative probabilistic model for identification of structural variation in sequencing data.

Genome Biol

Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA.

Published: September 2012

Paired-end sequencing is a common approach for identifying structural variation (SV) in genomes. Discrepancies between the observed and expected alignments indicate potential SVs. Most SV detection algorithms use only one of the possible signals and ignore reads with multiple alignments. This results in reduced sensitivity to detect SVs, especially in repetitive regions. We introduce GASVPro, an algorithm combining both paired read and read depth signals into a probabilistic model which can analyze multiple alignments of reads. GASVPro outperforms existing methods with a 50-90% improvement in specificity on deletions and a 50% improvement on inversions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3439973PMC
http://dx.doi.org/10.1186/gb-2012-13-3-r22DOI Listing

Publication Analysis

Top Keywords

probabilistic model
8
structural variation
8
multiple alignments
8
integrative probabilistic
4
model identification
4
identification structural
4
variation sequencing
4
sequencing data
4
data paired-end
4
paired-end sequencing
4

Similar Publications

Machine learning assisted classification RASAR modeling for the nephrotoxicity potential of a curated set of orally active drugs.

Sci Rep

January 2025

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, 700 032, India.

We have adopted the classification Read-Across Structure-Activity Relationship (c-RASAR) approach in the present study for machine-learning (ML)-based model development from a recently reported curated dataset of nephrotoxicity potential of orally active drugs. We initially developed ML models using nine different algorithms separately on topological descriptors (referred to as simply "descriptors" in the subsequent sections of the manuscript) and MACCS fingerprints (referred to as "fingerprints" in the subsequent sections of the manuscript), thus generating 18 different ML QSAR models. Using the chemical spaces defined by the modeling descriptors and fingerprints, the similarity and error-based RASAR descriptors were computed, and the most discriminating RASAR descriptors were used to develop another set of 18 different ML c-RASAR models.

View Article and Find Full Text PDF

Climate change significantly impacts the risk of eutrophication and, consequently, chlorophyll-a (Chl-a) concentrations. Understanding the impact of water flows is a crucial first step in developing insights into future patterns of change and associated risks. In this study, the Statistical DownScaling Model (SDSM)-a widely used daily downscaling method-is implemented to produce downscaled local climate variables, which serve as input for simulating future hydro-climate conditions using a hydrological model.

View Article and Find Full Text PDF

Spatial protein expression technologies can map cellular content and organization by simultaneously quantifying the expression of >40 proteins at subcellular resolution within intact tissue sections and cell lines. However, necessary image segmentation to single cells is challenging and error prone, easily confounding the interpretation of cellular phenotypes and cell clusters. To address these limitations, we present STARLING, a probabilistic machine learning model designed to quantify cell populations from spatial protein expression data while accounting for segmentation errors.

View Article and Find Full Text PDF

Background: To summarize the statistical performance of machine learning in predicting revision, secondary knee injury, or reoperations following anterior cruciate ligament reconstruction (ACLR), and to provide a general overview of the statistical performance of these models.

Methods: Three online databases (PubMed, MEDLINE, EMBASE) were searched from database inception to February 6, 2024, to identify literature on the use of machine learning to predict revision, secondary knee injury (e.g.

View Article and Find Full Text PDF

Economic impact of prolonged tracheal extubation times on operating room time overall and for subgroups of surgeons: a historical cohort study.

BMC Anesthesiol

January 2025

Department of Anesthesiology, Perioperative Medicine and Pain Management, 1611 NW 12, University of Miami, Miami, FL, 33136, USA.

Background: Prolonged tracheal extubation time is defined as an interval ≥ 15 min from the end of surgery to extubation. An earlier study showed that prolonged extubations had a mean 12.4 min longer time from the end of surgery to operating room (OR) exit.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!