Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.

Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.

Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3009522PMC
http://dx.doi.org/10.1186/1471-2105-11-S1-S5DOI Listing

Publication Analysis

Top Keywords

genetic ensemble
16
gene selection
12
sample classification
12
multi-filter enhanced
8
enhanced genetic
8
ensemble system
8
system gene
8
microarray data
8
feature selection
8
reduce data
8

Similar Publications

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, Germany, Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, USA.

Background: Despite recent breakthroughs, Alzheimer's disease (AD) remains untreatable. In addition, we are still lacking robust biomarkers for early diagnosis and promising novel targets for therapeutic intervention. To enable utilizing the entirety of molecular evidence in the discovery and prioritization of potential novel biomarkers and targets, we have developed the AD Atlas, a network-based multi-omics data integration platform.

View Article and Find Full Text PDF

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

Brain Research Institute, Niigata University, Niigata, Niigata, Japan.

Background: APOE is well recognized to be the most influential susceptibility gene for Alzheimer's disease (AD). For the wild-type allele, e3, it is known that the e4 allele is a risk for AD, while the e2 allele is protective. Recently, genetic analyses with Caucasians have reported the critical associations between APOE rare missense variants (RMVs) and AD, and their importance has been pointed out in terms of disease pathogenesis of AD.

View Article and Find Full Text PDF

Background: Alzheimer's disease (AD) is neurodegenerative disease brought on by a combination of changes in multiple pathways that conglomerate to promote disease progression. AD often occurs alongside comorbid diseases, most often immune or vascular in nature, which have been shown to further increase AD risk. We previously showed that known AD variants also associate with secondary diseases in these categories, including rheumatoid arthritis, ischemic myocardial infarction, and both Type 1 and Type 2 diabetes.

View Article and Find Full Text PDF

Basic Science and Pathogenesis.

Alzheimers Dement

December 2024

The Jackson Laboratory, Bar Harbor, ME, USA.

Background: The genetic etiology of late-onset Alzheimer's disease (LOAD) is complex, with over 75 identified loci contributing to disease risk. Recent efforts of the MODEL-AD consortia have yielded several dozen mouse strains harboring variation designed to model LOAD risk alleles. Given the complex genetic architecture of LOAD, developing animal models that combine multiple risk alleles is likely essential to improving the fidelity of these models to human disease.

View Article and Find Full Text PDF

Evaluation of machine learning algorithms and computational structural validation of CYP2D6 in predicting the therapeutic response to tamoxifen in breast cancer.

Eur Rev Med Pharmacol Sci

December 2024

Department of Pharmacology & Therapeutics, College of Medicine and Health Sciences, Arabian Gulf University, Manama, Kingdom of Bahrain.

Objective: CYP2D6 plays a critical role in metabolizing tamoxifen into its active metabolite, endoxifen, which is crucial for its therapeutic effect in estrogen receptor-positive breast cancer. Single nucleotide polymorphisms (SNPs) in the CYP2D6 gene can affect enzyme activity and thus impact tamoxifen efficacy. This study aimed to use machine learning algorithms (MLAs) to identify significant predictors of Breast Cancer-Free Interval (BCFI) and to apply bioinformatics tools to investigate the structural and functional implications of CYP2D6 SNPs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!