Many human diseases result from a complex interplay of behavioral, clinical, and molecular factors. Integrating low-dimensional behavioral and clinical features with high-dimensional molecular profiles can significantly improve disease outcome prediction and diagnosis. However, while some biomarkers are crucial, many lack informative value. To enhance prediction accuracy and understand disease mechanisms, it is essential to integrate relevant features and identify key biomarkers, separating meaningful data from noise and modeling complex associations. To address these challenges, we introduce the High-dimensional Feature Importance Test (HdFIT) framework for machine learning models. HdFIT includes a feature screening step for dimension reduction and leverages machine learning to model complex associations between biomarkers and disease outcomes. It robustly evaluates each feature's impact. Extensive Monte Carlo experiments and a real microbiome study demonstrate HdFIT's efficacy, especially when integrated with advanced models like deep neural networks. Our framework shows significant improvements in identifying crucial features and enhancing prediction accuracy, even in high-dimensional settings.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735761PMC
http://dx.doi.org/10.1093/bib/bbae709DOI Listing

Publication Analysis

Top Keywords

feature test
8
improve disease
8
disease outcome
8
outcome prediction
8
behavioral clinical
8
prediction accuracy
8
complex associations
8
machine learning
8
deep learning
4
learning feature
4

Similar Publications

Introduction: Unsupervised feature learning methods inspired by natural language processing (NLP) models are capable of constructing patient-specific features from longitudinal Electronic Health Records (EHR).

Design: We applied document embedding algorithms to real-world paediatric intensive care (PICU) EHR data to extract patient-specific features from 1853 patients' PICU journeys using 647 unique lab tests and medication events. We evaluated the clinical utility of the patient features via a K-means clustering analysis.

View Article and Find Full Text PDF

Virtual biopsy for non-invasive identification of follicular lymphoma histologic transformation using radiomics-based imaging biomarker from PET/CT.

BMC Med

January 2025

Department of Nuclear Medicine, West China Hospital, Sichuan University, Guoxue Alley, Address: No.37, Chengdu City, Sichuan, 610041, China.

Background: This study aimed to construct a radiomics-based imaging biomarker for the non-invasive identification of transformed follicular lymphoma (t-FL) using PET/CT images.

Methods: A total of 784 follicular lymphoma (FL), diffuse large B-cell lymphoma, and t-FL patients from 5 independent medical centers were included. The unsupervised EMFusion method was applied to fuse PET and CT images.

View Article and Find Full Text PDF

Background: Early detection and diagnosis of cancer are vital to improving outcomes for patients. Artificial intelligence (AI) models have shown promise in the early detection and diagnosis of cancer, but there is limited evidence on methods that fully exploit the longitudinal data stored within electronic health records (EHRs). This review aims to summarise methods currently utilised for prediction of cancer from longitudinal data and provides recommendations on how such models should be developed.

View Article and Find Full Text PDF

Prediction of isocitrate dehydrogenase (IDH) mutation status and epilepsy occurrence are important to glioma patients. Although machine learning models have been constructed for both issues, the correlation between them has not been explored. Our study aimed to exploit this correlation to improve the performance of both of the IDH mutation status identification and epilepsy diagnosis models in patients with glioma II-IV.

View Article and Find Full Text PDF

This study presents a comprehensive workflow to detect low seismic amplitude gas fields in hydrocarbon exploration projects, focusing on the West Delta Deep Marine (WDDM) concession, offshore Egypt. The workflow integrates seismic spectral decomposition and machine learning algorithms to identify subtle anomalies, including low seismic amplitude gas sand and background amplitude water sand. Spectral decomposition helps delineate the fairway boundaries and structural features, while Amplitude Versus Offset (AVO) analysis is used to validate gas sand anomalies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!