Background: Deep sequencing is a powerful tool for assessing viral genetic diversity. Such experiments harness the high coverage afforded by next generation sequencing protocols by treating sequencing reads as a population sample. Distinguishing true single nucleotide variants (SNVs) from sequencing errors remains challenging, however. Current protocols are characterised by high false positive rates, with results requiring time consuming manual checking.

Results: By statistical modelling, we show that if multiple variant sites are considered at once, SNVs can be called reliably from high coverage viral deep sequencing data at frequencies lower than the error rate of the sequencing technology, and that SNV calling accuracy increases as true sequence diversity within a read length increases. We demonstrate these findings on two control data sets, showing that SNV detection is more reliable on a high diversity human immunodeficiency virus sample as compared to a moderate diversity sample of hepatitis C virus. Finally, we show that in situations where probabilistic clustering retains false positive SNVs (for instance due to insufficient sample diversity or systematic errors), applying a strand bias test based on a beta-binomial model of forward read distribution can improve precision, with negligible cost to true positive recall.

Conclusions: By combining probabilistic clustering (implemented in the program ShoRAH) with a statistical test of strand bias, SNVs may be called from deeply sequenced viral populations with high accuracy.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3848937PMC
http://dx.doi.org/10.1186/1471-2164-14-501DOI Listing

Publication Analysis

Top Keywords

probabilistic clustering
12
strand bias
12
single nucleotide
8
viral populations
8
combining probabilistic
8
statistical test
8
test strand
8
deep sequencing
8
high coverage
8
false positive
8

Similar Publications

Segment, Compare, and Learn: Creating Movement Libraries of Complex Task for Learning from Demonstration.

Biomimetics (Basel)

January 2025

RoboticsLab, Universidad Carlos III de Madrid, 28911 Madrid, Spain.

Motion primitives are a highly useful and widely employed tool in the field of Learning from Demonstration (LfD). However, obtaining a large number of motion primitives can be a tedious process, as they typically need to be generated individually for each task to be learned. To address this challenge, this work presents an algorithm for acquiring robotic skills through automatic and unsupervised segmentation.

View Article and Find Full Text PDF

Much research in the behavioral sciences aims to characterize the "typical" person. A statistically significant group-averaged effect size is often interpreted as evidence that the typical person shows an effect, but that is only true under certain distributional assumptions for which explicit evidence is rarely presented. Mean effect size varies with both within-participant effect size and population prevalence (proportion of population showing effect).

View Article and Find Full Text PDF

Extracting the fingerprints of sequences of random rhythmic auditory stimuli from electrophysiological data.

PLoS Comput Biol

January 2025

Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.

It has been classically conjectured that the brain assigns probabilistic models to sequences of stimuli. An important issue associated with this conjecture is the identification of the classes of models used by the brain to perform this task. We address this issue by using a new clustering procedure for sets of electroencephalographic (EEG) data recorded from participants exposed to a sequence of auditory stimuli generated by a stochastic chain.

View Article and Find Full Text PDF

The lexicon is an evolving symbolic system that expresses an unbounded set of emerging meanings with a limited vocabulary. As a result, words often extend to new meanings. Decades of research have suggested that word meaning extension is non-arbitrary, and recent work formalizes this process as cognitive models of semantic chaining whereby emerging meanings link to existing ones that are semantically close.

View Article and Find Full Text PDF

Conventional scanned optical coherence tomography (OCT) suffers from the frame rate/resolution tradeoff, whereby increasing image resolution leads to decreases in the maximum achievable frame rate. To overcome this limitation, we propose two variants of machine learning (ML)-based adaptive scanning approaches: one using a ConvLSTM-based sequential prediction model and another leveraging a temporal attention unit (TAU)-based parallel prediction model for scene dynamics prediction. These models are integrated with a kinodynamic path planner based on the clustered traveling salesperson problem to create two versions of ML-based adaptive scanning pipelines.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!