Proteomic prediction of diverse incident diseases: a machine learning-guided biomarker discovery study using data from a prospective cohort study.

Lancet Digit Health

MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Institute of Metabolic Science, Cambridge, UK; Computational Medicine, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany; Precision Healthcare University Research Institute, Queen Mary University of London, London, UK. Electronic address:

Published: July 2024

Background: Broad-capture proteomic technologies have the potential to improve disease prediction, enabling targeted prevention and management, but studies have so far been limited to very few selected diseases and have not evaluated predictive performance across multiple conditions. We aimed to evaluate the potential of serum proteins to improve risk prediction over and above health-derived information and polygenic risk scores across a diverse set of 24 outcomes.

Methods: We designed multiple case-cohorts nested in the EPIC-Norfolk prospective study, from participants with available serum samples and genome-wide genotype data, with more than 32 974 person-years of follow-up. Participants were middle-aged individuals (aged 40-79 years at baseline) of European ancestry who were recruited from the general population of Norfolk, England, between March, 1993 and December, 1997. We selected participants who developed one of ten less common diseases within 10 years of follow-up; we also subsampled a randomly drawn control subcohort, which also served to investigate 14 more common outcomes (n>70), including all-cause premature mortality (death before the age of 75 years; case numbers 71-437; controls 608-1556). Individuals were excluded from the current study owing to failed genotyping or proteomic quality control, relatedness, or missing information on age, sex, BMI, or smoking status. We used a machine learning framework to derive sparse predictive protein models for the onset of the the 23 individual diseases and all-cause premature mortality, and to derive a single common sparse multimorbidity signature that was predictive across multiple diseases from 2923 serum proteins.

Findings: Participants who developed one of ten less common diseases within 10 years of follow-up included 482 women and 507 men, with a mean age at baseline of 64·56 years (8·08). The random subcohort included 990 women and 769 men, with a mean age of 58·79 years (9·31). As few as five proteins alone outperformed polygenic risk scores for 17 of 23 outcomes (median dfference in concordance index [C-index] 0·13 [0·10-0·17]) and improved predictive performance when added over basic patient-derived information models for seven outcomes, achieving a median C-index of 0·82 (IQR 0·77-0·82). This included diseases with poor prognosis such as lung cancer (C-index 0·85 [+/- cross-validation error 0·83-0·87]), for which we identified unreported biomarkers such as C-X-C motif chemokine ligand 17. A sparse multimorbidity signature of ten proteins improved prediction across seven outcomes over patient-derived information models, achieving performances (median C-index 0·81 [IQR 0·80-0·82]) similar to those of disease-specific signatures.

Interpretation: We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and to benchmark these against clinical assays, which are required to understand the translational potential of these findings.

Funding: Medical Research Council, Health Data Research UK, UK Research and Innovation-National Institute for Health and Care Research, Cancer Research UK, and Wellcome Trust.

Download full-text PDF

Source
http://dx.doi.org/10.1016/S2589-7500(24)00087-6DOI Listing

Publication Analysis

Top Keywords

multiple diseases
12
diseases
9
biomarker discovery
8
broad-capture proteomic
8
predictive performance
8
polygenic risk
8
risk scores
8
participants developed
8
developed ten
8
ten common
8

Similar Publications

Interferon γ-induced protein 10 kDa (IP-10) or C-X-C motif chemokine 10 (CXCL10) is produced and secreted from specific leukocytes such as neutrophils, eosinophils, and monocytes, which play key roles in the immune response to Plasmodium infections. This systematic review aimed to collate and critically appraise the current evidence on IP-10 levels in malaria patients. It provided insights into its role in malaria pathogenesis and potential as a biomarker for Plasmodium infections and disease severity.

View Article and Find Full Text PDF

To investigate for the risk of uveitis among such patients. A retrospective cohort study utilized the TriNetX database and recruited pediatric autoimmune patients diagnosed between January 1st 2004 and December 31st 2022. The non-autoimmune cohort were randomly selected control patients matched by sex, age, and index year.

View Article and Find Full Text PDF

Ensuring everyone enjoys healthy lifestyles and well-being at all ages, Progress has been made in increasing access to clean water and sanitation facilities and reducing the spread of epidemics and diseases. The synthesis of nano-particles (NPs) by using microalgae is a new nanobiotechnology due to the use of the biomolecular (corona) of microalgae as a capping and reducing agent for NP creation. This investigation explores the capacity of a distinct indigenous microalgal strain to synthesize silver nano-particles (AgNPs), as well as its effectiveness against multi-drug resistant (MDR) bacteria and its ability to degrade Azo dye (Methyl Red) in wastewater.

View Article and Find Full Text PDF

Factors affecting fatigue progression in multiple sclerosis patients.

Sci Rep

December 2024

Nehme and Therese Tohme Multiple Sclerosis Center, American University of Beirut Medical Center, Riad El-Solh, PO Box 11-0236, 1107 2020, Beirut, Lebanon.

Fatigue is one of the most prevalent and disabling symptoms among patients with MS, but there is limited research investigating the longitudinal determinants of fatigue progression. This study aims to identify the sociodemographic, behavioral and clinical characteristics, and therapeutic regimens that are correlated with worsening fatigue over time in patients diagnosed with MS. This is a retrospective chart review of 483 patients.

View Article and Find Full Text PDF

Recently, RNA velocity has driven a paradigmatic change in single-cell RNA sequencing (scRNA-seq) studies, allowing the reconstruction and prediction of directed trajectories in cell differentiation and state transitions. Most existing methods of dynamic modeling use ordinary differential equations (ODE) for individual genes without applying multivariate approaches. However, this modeling strategy inadequately captures the intrinsically stochastic nature of transcriptional dynamics governed by a cell-specific latent time across multiple genes, potentially leading to erroneous results.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!