Cancer is the second leading cause of disease-related death worldwide, and machine learning-based identification of novel biomarkers is crucial for improving early detection and treatment of various cancers. A key challenge in applying machine learning to high-dimensional data is deriving important features in an interpretable manner to provide meaningful insights into the underlying biological mechanisms We developed a class-based directional feature importance (CLIFI) metric for decision tree methods and demonstrated its use for The Cancer Genome Atlas proteomics data. The CLIFI metric was incorporated into four algorithms, Random Forest (RF), LAtent VAriable Stochastic Ensemble of Trees (LAVASET), and Gradient Boosted Decision Trees (GBDTs), and a new extension incorporating the LAVA step into GBDTs (LAVABOOST).
View Article and Find Full Text PDFBackground: Excess fibrotic remodeling causes cardiac dysfunction in ischemic heart disease, driven by MAP (mitogen-activated protein) kinase-dependent TGF-ß1 (transforming growth factor-ß1) activation by coagulation signaling of myeloid cells. How coagulation-inflammatory circuits can be specifically targeted to achieve beneficial macrophage reprogramming after myocardial infarction (MI) is not completely understood.
Methods: Mice with permanent ligation of the left anterior descending artery were used to model nonreperfused MI and analyzed by single-cell RNA sequencing, protein expression changes, confocal microscopy, and longitudinal monitoring of recovery.
Enzymes are indispensable in many biological processes, and with biomedical literature growing exponentially, effective literature review becomes increasingly challenging. Natural language processing methods offer solutions to streamline this process. This study aims to develop an annotated enzyme corpus for training and evaluating enzyme named entity recognition (NER) models.
View Article and Find Full Text PDFContext: The role of glucagon-like peptide-1 (GLP-1) in type 2 diabetes (T2D) and obesity is not fully understood.
Objective: We investigate the association of cardiometabolic, diet, and lifestyle parameters on fasting and postprandial GLP-1 in people at risk of, or living with, T2D.
Methods: We analyzed cross-sectional data from the two Innovative Medicines Initiative (IMI) Diabetes Research on Patient Stratification (DIRECT) cohorts, cohort 1 (n = 2127) individuals at risk of diabetes; cohort 2 (n = 789) individuals with new-onset T2D.
Motivation: Random forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction. However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting.
View Article and Find Full Text PDF