Background: Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal) samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a) all genes on the microarray platform and b) a list of known disease-related genes (a priori selection). We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms.

Results: Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform.

Conclusion: Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine learning approaches. These findings are relevant to the use of molecular profiling for the identification of candidate biomarker panels.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2211325PMC
http://dx.doi.org/10.1186/1471-2105-8-415DOI Listing

Publication Analysis

Top Keywords

expression profiles
20
prediction error
16
expression data
12
profiles
10
expression
9
candidate biomarker
8
gene expression
8
machine learning
8
error profiles
8
profiles correlated
8

Similar Publications

Background: Bioinformatics analysis of hepatocellular carcinoma (HCC) expression profiles can aid in understanding its molecular mechanisms and identifying new targets for diagnosis and treatment.

Aim: In this study, we analyzed expression profile datasets and miRNA expression profiles related to HCC from the GEO using R software to detect differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs).

Methods And Results: Common DEGs were identified, and a PPI network was constructed using the STRING database and Cytoscape software to identify hub genes.

View Article and Find Full Text PDF

Ethnopharmacological Importance: Zhili decoction (ZLD) is a traditional Chinese medicine prescription for ulcerative colitis (UC). However, the mechanism by which ZLD exerts its therapeutic effects in the context of UC remains unclear.

Aim Of Study: The aim of this study was to investigate the effects of ZLD on the gut microbiota and related fecal metabolite levels using a mouse model of UC.

View Article and Find Full Text PDF

Introduction: The COVID-19 pandemic has become a global health crisis, eliciting varying severity in infected individuals. This study aimed to explore the immune profiles between moderate and severe COVID-19 patients experiencing a cytokine storm and their association with mortality. This study highlights the role of PD-1/PD-L1 and the TIGIT/CD226/CD155/CD112 pathways in COVID-19 patients.

View Article and Find Full Text PDF

Cutaneous melanoma is the deadliest form of skin cancer. Despite advancements in treatment, many patients still face poor outcomes. A deeper understanding of the mechanisms involved in melanoma pathogenesis is crucial for improving diagnosis and therapy.

View Article and Find Full Text PDF

Nucleotide sequence can be translated in three reading frames from 5' to 3' producing distinct protein products. Many examples of RNA translation in two reading frames (dual coding) have been identified so far. We report simultaneous translation of mRNA transcripts derived from locus in all three reading frames that result in the synthesis of long proteins.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!