A feature selection approach for identification of signature genes from SAGE data.

BMC Bioinformatics

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil.

Published: May 2007

Background: One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.

Results: A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology.

Conclusion: The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1891113PMC
http://dx.doi.org/10.1186/1471-2105-8-169DOI Listing

Publication Analysis

Top Keywords

signature genes
12
sage data
12
biological states
12
genes
10
sage
8
genes sage
8
gene expression
8
expression profiling
8
identify signature
8
select specific
8

Similar Publications

Signatures of H3K4me3 modification predict cancer immunotherapy response and identify a new immune checkpoint-SLAMF9.

Respir Res

January 2025

Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.

H3 lysine 4 trimethylation (H3K4me3) modification and related regulators extensively regulate various crucial transcriptional courses in health and disease. However, the regulatory relationship between H3K4me3 modification and anti-tumor immunity has not been fully elucidated. We identified 72 independent prognostic genes of lung adenocarcinoma (LUAD) whose transcriptional expression were closely correlated with known 27 H3K4me3 regulators.

View Article and Find Full Text PDF

Preeclampsia (PE) is a major pregnancy-specific cardiovascular complication posing latent life-threatening risks to mothers and neonates. The contribution of immune dysregulation to PE is not fully understood, highlighting the need to explore molecular markers and their relationship with immune infiltration to potentially inform therapeutic strategies. We used bioinformatics tools to analyze gene expression data from the Gene Expression Omnibus (GEO) database using the GEOquery package in R.

View Article and Find Full Text PDF

Recent advances in single-cell RNA-Sequencing (scRNA-Seq) technologies have revolutionized our ability to gather molecular insights into different phenotypes at the level of individual cells. The analysis of the resulting data poses significant challenges, and proper statistical methods are required to analyze and extract information from scRNA-Seq datasets. Sample classification based on gene expression data has proven effective and valuable for precision medicine applications.

View Article and Find Full Text PDF

Background: Natural killer (NK) cells are important contributors to antitumor immunity in clear-cell renal cell carcinoma (ccRCC). However, their phenotype, function, and association with clinical outcomes in ccRCC remain poorly understood.

Materials And Methods: We analyzed single-cell RNA sequencing data from 13 primary tumors, 1 localized tumor extension, and 1 metastasis from ccRCC patients at different clinical stages.

View Article and Find Full Text PDF

Background: Primary pulmonary lymphoepithelial carcinoma (pLEC) is a subtype of non-small cell lung cancer (NSCLC) characterized by Epstein-Barr virus (EBV) infection. However, the molecular pathogenesis of pLEC remains poorly understood.

Methods: In this study, we explored pLEC using whole-exome sequencing (WES) and RNA-whole-transcriptome sequencing (RNA-seq) technologies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!