Bayesian clinical classification from high-dimensional data: Signatures versus variability.

Akram Shalabi Masato Inoue Johnathan Watkins Emanuele De Rinaldis Anthony Cc Coolen

Stat Methods Med Res

1 Institute for Mathematical and Molecular Biomedicine, King's College London, London, UK.

Published: February 2018

When data exhibit imbalance between a large number d of covariates and a small number n of samples, clinical outcome prediction is impaired by overfitting and prohibitive computation demands. Here we study two simple Bayesian prediction protocols that can be applied to data of any dimension and any number of outcome classes. Calculating Bayesian integrals and optimal hyperparameters analytically leaves only a small number of numerical integrations, and CPU demands scale as O(nd). We compare their performance on synthetic and genomic data to the mclustDA method of Fraley and Raftery. For small d they perform as well as mclustDA or better. For d = 10,000 or more mclustDA breaks down computationally, while the Bayesian methods remain efficient. This allows us to explore phenomena typical of classification in high-dimensional spaces, such as overfitting and the reduced discriminative effectiveness of signatures compared to intra-class variability.

Download full-text PDF	Source
http://dx.doi.org/10.1177/0962280216628901	DOI Listing

Publication Analysis

Top Keywords

classification high-dimensional

small number

bayesian

bayesian clinical

clinical classification

data

high-dimensional data

data signatures

signatures versus

versus variability

Similar Publications

An effective feature selection approach based on hybrid Grey Wolf Optimizer and Genetic Algorithm for hyperspectral image.

Sci Rep

January 2025

School of Information Engineering, China University of Geosciences, Beijing, 100083, China.

Yiqun Shang Minrui Zheng Jiayang Li Xinqi Zheng

Feature selection (FS) is a critical step in hyperspectral image (HSI) classification, essential for reducing data dimensionality while preserving classification accuracy. However, FS for HSIs remains an NP-hard challenge, as existing swarm intelligence and evolutionary algorithms (SIEAs) often suffer from limited exploration capabilities or susceptibility to local optima, particularly in high-dimensional scenarios. To address these challenges, we propose GWOGA, a novel hybrid algorithm that combines Grey Wolf Optimizer (GWO) and Genetic Algorithm (GA), aiming to achieve an effective balance between exploration and exploitation.

View Article and Find Full Text PDF

Similar Publications

Surrogate-assisted global and distributed local collaborative optimization algorithm for expensive constrained optimization problems.

Sci Rep

January 2025

Jiangxi Tellhow Power Technology Co., Ltd, Nanchang, 330031, China.

Xiangyong Liu Zan Yang Jiansheng Liu Junxing Xiong Jihui Huang

This paper presents a surrogate-assisted global and distributed local collaborative optimization (SGDLCO) algorithm for expensive constrained optimization problems where two surrogate optimization phases are executed collaboratively at each generation. As the complexity of optimization problems and the cost of solutions increase in practical applications, how to efficiently solve expensive constrained optimization problems with limited computational resources has become an important area of research. Traditional optimization algorithms often struggle to balance the efficiency of global and local searches, especially when dealing with high-dimensional and complex constraint conditions.

View Article and Find Full Text PDF

Similar Publications

Utility of word embeddings from large language models in medical diagnosis.

J Am Med Inform Assoc

January 2025

Kennewick, WA 99338, United States.

Shahram Yazdani Ronald Claude Henry Avery Byrne Isaac Claude Henry

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.

View Article and Find Full Text PDF

Similar Publications

Host tissue factors predict immune surveillance and therapeutic outcomes in gastric cancer.

Cancer Immunol Res

January 2025

Memorial Sloan Kettering Cancer Center, New York, NY, United States.

Miseker Abate Emily Stroobant Teng Fei Ya-Hui Lin Shoji Shimada

The immune composition of solid tumors is typically inferred from biomarkers, such as histologic and molecular classifications, somatic mutational burden, and PD-L1 expression. However, the extent to which these biomarkers predict the immune landscape in gastric adenocarcinoma-an aggressive cancer often linked to chronic inflammation-remains poorly understood. We leveraged high-dimensional spectral cytometry to generate a comprehensive single-cell immune landscape of tumors, normal tissue, and lymph nodes from patients in the Western Hemisphere with gastric adenocarcinoma.

View Article and Find Full Text PDF

Similar Publications

Efficient Explainable Models for Alzheimer's Disease Classification with Feature Selection and Data Balancing Approach Using Ensemble Learning.

Diagnostics (Basel)

December 2024

Directorate of Research and Innovation, SPDC, Datta Meghe Institute of Higher Education and Research, Wardha 442001, India.

Yogita Dubey Aditya Bhongade Prachi Palsodkar Punit Fulzele

Alzheimer's disease (AD) is a progressive neurodegenerative disorder and is the most common cause of dementia. Early diagnosis of Alzheimer's disease is critical for better management and treatment outcomes, but it remains a challenging task due to the complex nature of the disease. Clinical data, including a range of cognitive, functional, and demographic variables, play a crucial role in Alzheimer's disease classification.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!