Background: Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.

Results: We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.

Conclusion: The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1698579PMC
http://dx.doi.org/10.1186/1471-2105-7-514DOI Listing

Publication Analysis

Top Keywords

naïve bayes
24
bayes classifier
16
sample heterogeneity
12
hierarchical naïve
8
bayes model
8
tma experiments
8
biological heterogeneity
8
standard naïve
8
heterogeneity
7
bayes
6

Similar Publications

Using genetic data to infer evolutionary distances between molecular sequence pairs based on a Markov substitution model is a common procedure in phylogenetics, in particular for selecting a good starting tree to improve upon. Many evolutionary patterns can be accurately modelled using substitution models that are available in closed form, including the popular general time reversible model (GTR) for DNA data. For more complex biological phenomena, such as variations in lineage-specific evolutionary rates over time (heterotachy), other approaches such as the GTR with rate variation (GTR ) are required, but do not admit analytical solutions and do not automatically allow for likelihood calculations crucial for Bayesian analysis.

View Article and Find Full Text PDF

To compare the effectiveness of injury prevention programs (IPPs) for improving high-risk knee motion patterns in the context of reducing the risk of noncontact anterior cruciate ligament injury. Systematic review with Bayesian network meta-analysis. PubMed, Embase, Web of Science, Cochrane Library, and the Cumulative Index to Nursing and Allied Health Literature were searched until September 10, 2023.

View Article and Find Full Text PDF

A recent study design for clinical trials with small sample sizes is the small n, sequential, multiple assignment, randomized trial (snSMART). An snSMART design has been previously proposed to compare the efficacy of two dose levels versus placebo. In such a trial, participants are initially randomized to receive either low dose, high dose or placebo in stage 1.

View Article and Find Full Text PDF

Background: Chronic hepatitis B and cirrhosis pose significant global health threats. Few studies have explored the disease burden and mortality trend of cirrhosis caused by hepatitis B virus infection among adolescents and young adults (AYAs, aged 15-39 years). This study aimed to assess the disease burden and trends.

View Article and Find Full Text PDF

edgeR is an R/Bioconductor software package for differential analyses of sequencing data in the form of read counts for genes or genomic features. Over the past 15 years, edgeR has been a popular choice for statistical analysis of data from sequencing technologies such as RNA-seq or ChIP-seq. edgeR pioneered the use of the negative binomial distribution to model read count data with replicates and the use of generalized linear models to analyze complex experimental designs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!