The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11390475PMC
http://dx.doi.org/10.1038/s41588-024-01898-1DOI Listing

Publication Analysis

Top Keywords

disease prediction
4
prediction multi-omics
4
multi-omics biomarkers
4
biomarkers empowers
4
empowers case-control
4
case-control genetic
4
genetic discoveries
4
discoveries biobank
4
biobank emergence
4
emergence biobank-level
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!