The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1741 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,489 UK Biobank participants.
View Article and Find Full Text PDFBackground: Despite the growing interest in the use of human genomic data for drug target identification and validation, the extent to which the spectrum of human disease has been addressed by genome-wide association studies (GWAS), or by drug development, and the degree to which these efforts overlap remain unclear.
Methods: In this study we harmonize and integrate different data sources to create a sample space of all the human drug targets and diseases and identify points of convergence or divergence of GWAS and drug development efforts.
Results: We show that only 612 of 11,158 diseases listed in Human Disease Ontology have an approved drug treatment in at least one region of the world.
Background: Electronic health records (EHRs) have the potential to be used to produce detailed disease burden estimates. In this study we created disease estimates using national EHR for three high burden conditions, compared estimates between linked and unlinked datasets and produced stratified estimates by age, sex, ethnicity, socio-economic deprivation and geographical region.
Methods: EHRs containing primary care (Clinical Practice Research Datalink), secondary care (Hospital Episode Statistics) and mortality records (Office for National Statistics) were used.
For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Here, in 41,931 individuals from the United Kingdom Biobank Pharma Proteomics Project, we integrated measurements of ~3,000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and rare diseases (81-6,038 cases). We then compared prediction models developed using proteomic data with models developed using either basic clinical information alone or clinical information combined with data from 37 clinical assays.
View Article and Find Full Text PDFEarly evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses, including common, but non-fatal diseases are scarce, but may help to understand severe COVID-19 among patients at supposedly low risk. Here, we systematically harmonized >12 million primary care and hospitalisation health records from ~500,000 UK Biobank participants into 1448 collated disease terms to systematically identify diseases predisposing to severe COVID-19 (requiring hospitalisation or death) and its post-acute sequalae, Long COVID.
View Article and Find Full Text PDFBackground: Early evidence that patients with (multiple) pre-existing diseases are at highest risk for severe COVID-19 has been instrumental in the pandemic to allocate critical care resources and later vaccination schemes. However, systematic studies exploring the breadth of medical diagnoses are scarce but may help to understand severe COVID-19 among patients at supposedly low risk.
Methods: We systematically harmonized >12 million primary care and hospitalisation health records from ~500,000 UK Biobank participants into 1448 collated disease terms to systematically identify diseases predisposing to severe COVID-19 (requiring hospitalisation or death) and its post-acute sequalae, Long COVID.
Background: Our study examined whether prevalent and incident comorbidities are increased in idiopathic pulmonary fibrosis (IPF) patients when compared to matched chronic obstructive pulmonary disease (COPD) patients and control subjects without IPF or COPD.
Methods: IPF and age, gender and smoking matched COPD patients, diagnosed between 01/01/1997 and 01/01/2019 were identified from the Clinical Practice Research Datalink GOLD database multiple registrations cohort at the first date an ICD-10 or read code mentioned IPF/COPD. A control cohort comprised age, gender and pack-year smoking matched subjects without IPF or COPD.
Objective: To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms.
Materials And Methods: We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control.
The COVID-19 pandemic exposed a global deficiency of systematic, data-driven guidance to identify high-risk individuals. Here, we illustrate the utility of routinely recorded medical history to predict the risk for 1883 diseases across clinical specialties and support the rapid response to emerging health threats such as COVID-19. We developed a neural network to learn from health records of 502,460 UK Biobank.
View Article and Find Full Text PDFBackground: An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims to evaluate the effectiveness of Foresight, a generative transformer in temporal modelling of patient data, integrating both free text and structured formats, to predict a diverse array of future medical outcomes, such as disorders, substances (eg, to do with medicines, allergies, or poisonings), procedures, and findings (eg, relating to observations, judgements, or assessments).
View Article and Find Full Text PDFBackground: The occurrence of a range of health outcomes following myocardial infarction (MI) is unknown. Therefore, this study aimed to determine the long-term risk of major health outcomes following MI and generate sociodemographic stratified risk charts in order to inform care recommendations in the post-MI period and underpin shared decision making.
Methods And Findings: This nationwide cohort study includes all individuals aged ≥18 years admitted to one of 229 National Health Service (NHS) Trusts in England between 1 January 2008 and 31 January 2017 (final follow-up 27 March 2017).
Background/objectives: When studying the effect of weight change between two time points on a health outcome using observational data, two main problems arise initially (i) 'when is time zero?' and (ii) 'which confounders should we account for?' From the baseline date or the 1st follow-up (when the weight change can be measured)? Different methods have been previously used in the literature that carry different sources of bias and hence produce different results.
Methods: We utilised the target trial emulation framework and considered weight change as a hypothetical intervention. First, we used a simplified example from a hypothetical randomised trial where no modelling is required.
Objective: To clarify the performance of polygenic risk scores in population screening, individual risk prediction, and population risk stratification.
Design: Secondary analysis of data in the Polygenic Score Catalog.
Setting: Polygenic Score Catalog, April 2022.
Raynaud's phenomenon (RP) is a common vasospastic disorder that causes severe pain and ulcers, but despite its high reported heritability, no causal genes have been robustly identified. We conducted a genome-wide association study including 5,147 RP cases and 439,294 controls, based on diagnoses from electronic health records, and identified three unreported genomic regions associated with the risk of RP (p < 5 × 10). We prioritized ADRA2A (rs7090046, odds ratio (OR) per allele: 1.
View Article and Find Full Text PDFFish exposed to water supersaturated with dissolved gas experience gas embolism similar to decompression sickness (DCS), known as gas bubble disease (GBD) in fish. GBD has been postulated as an alternative to traditional mammals' models on DCS. Gas embolism can cause mechanical and biochemical damage, generating pathophysiological responses.
View Article and Find Full Text PDFThe pressor response induced by a voluntary hypoxic apnea is mediated largely by increased sympathetic outflow. The neural control of blood pressure is altered in recovery from acute heat exposure, but its effect on the pressor response to a voluntary hypoxic apnea has never been explored. Therefore, we tested the hypothesis that prior heat exposure would attenuate the pressor response induced by a voluntary hypoxic apnea.
View Article and Find Full Text PDFBackground: Machine learning has been used to analyse heart failure subtypes, but not across large, distinct, population-based datasets, across the whole spectrum of causes and presentations, or with clinical and non-clinical validation by different machine learning methods. Using our published framework, we aimed to discover heart failure subtypes and validate them upon population representative data.
Methods: In this external, prognostic, and genetic validation study we analysed individuals aged 30 years or older with incident heart failure from two population-based databases in the UK (Clinical Practice Research Datalink [CPRD] and The Health Improvement Network [THIN]) from 1998 to 2018.
Aims: Most adults presenting in primary care with chest pain symptoms will not receive a diagnosis ('unattributed' chest pain) but are at increased risk of cardiovascular events. To assess within patients with unattributed chest pain, risk factors for cardiovascular events and whether those at greatest risk of cardiovascular disease can be ascertained by an existing general population risk prediction model or by development of a new model.
Methods And Results: The study used UK primary care electronic health records from the Clinical Practice Research Datalink linked to admitted hospitalizations.
Background: Low-frequency variants play an important role in breast cancer (BC) susceptibility. Gene-based methods can increase power by combining multiple variants in the same gene and help identify target genes.
Methods: We evaluated the potential of gene-based aggregation in the Breast Cancer Association Consortium cohorts including 83,471 cases and 59,199 controls.
Big data is central to new developments in global clinical science aiming to improve the lives of patients. Technological advances have led to the routine use of structured electronic healthcare records with the potential to address key gaps in clinical evidence. The covid-19 pandemic has demonstrated the potential of big data and related analytics, but also important pitfalls.
View Article and Find Full Text PDFBackground: Globally, there is a paucity of multimorbidity and comorbidity data, especially for minority ethnic groups and younger people. We estimated the frequency of common disease combinations and identified non-random disease associations for all ages in a multiethnic population.
Methods: In this population-based study, we examined multimorbidity and comorbidity patterns stratified by ethnicity or race, sex, and age for 308 health conditions using electronic health records from individuals included on the Clinical Practice Research Datalink linked with the Hospital Episode Statistics admitted patient care dataset in England.