Optimality driven nearest centroid classification from genomic data.

PLoS One

Department of Statistics, Texas A&M University, College Station, Texas, United States of America.

Published: October 2007

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1991588PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0001002PLOS

Publication Analysis

Top Keywords

nearest centroid
12
feature selection
12
centroid classifiers
8
number features
8
proposed method
8
optimality driven
4
driven nearest
4
centroid classification
4
classification genomic
4
genomic data
4

Similar Publications

Background: To enhance the accuracy of allergen detection in cosmetic compounds, we developed a co-culture system that combines HaCaT keratinocytes (transfected with a luciferase plasmid driven by the AKR1C2 promoter) and THP-1 cells for machine learning applications.

Methods: Following chemical exposure, cell cytotoxicity was assessed using CCK-8 to determine appropriate stimulation concentrations. RNA-Seq was subsequently employed to analyze THP-1 cells, followed by differential expression gene (DEG) analysis and weighted gene co-expression net-work analysis (WGCNA).

View Article and Find Full Text PDF

Transportation pressure poses a serious threat to the health of live sheep and the quality of their meat. So, the edible Hu sheep was chosen as the research object for meat sheep. We constructed a systematic biosignal detecting, processing, and modeling method.

View Article and Find Full Text PDF

Clinical biomedical applications of genomic technologies are extensive and provide possibilities to enhance healthcare covering the span of medical talents. Genome disorder prediction is an important issue in biomedical research. Genome disorders cause multivariate diseases such as cancer, dementia, diabetes, Leigh syndrome, etc.

View Article and Find Full Text PDF

Spatial Analyses of Crisis Pregnancy Centers and Abortion Facilities in the United States, 2021 (Pre-Dobbs): Cross-Sectional Study.

JMIR Public Health Surveill

November 2024

Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, United States.

Background: Crisis pregnancy centers (CPCs) are religious nonprofit organizations with a primary mission of diverting people from having abortions. One CPC tactic has been to locate near abortion facilities. Despite medical groups' warnings that CPCs do not adhere to medical and ethical standards and pose risks, government support for CPCs has significantly increased.

View Article and Find Full Text PDF

Navigating stroke care: Geospatial assessment of regional stroke center accessibility: Geospatial Assessment of Stroke Centers.

J Stroke Cerebrovasc Dis

December 2024

Department of Emergency Medicine, Medical College Wisconsin, 8701 W Watertown Plank Rd, Milwaukee, WI, 53226, USA. Electronic address:

Introduction: Reducing time between stroke onset and hospital intervention is crucial for positive outcomes in stroke patients. While EMS utilization decreases time to intervention, many US regions are not within timely proximity to an advanced-care-capable stroke center (ASC), defined as a comprehensive or thrombectomy-capable center. This study aims to utilize geographic methodology to identify regions in Wisconsin with both high stroke mortality and low physical accessibility to certified stroke centers (SCs), particularly ASCs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!