Objectives: The All of Us Research Program harnesses advances in technology, science, and engagement for precision medicine research. We describe informatics innovations which support that goal and return value to the participant cohort and community.
Materials And Methods: Research data from the All of Us Research Program are available to authorized users on the All of Us Researcher Workbench.
Over 200 million SARS-CoV-2 patients have or will develop persistent symptoms (long COVID). Given this pressing research priority, the National COVID Cohort Collaborative (N3C) developed a machine learning model using only electronic health record data to identify potential patients with long COVID. We hypothesized that additional data from health surveys, mobile devices, and genotypes could improve prediction ability.
View Article and Find Full Text PDFRecently, large scale genomic projects such as All of Us and the UK Biobank have introduced a new research paradigm where data are stored centrally in cloud-based Trusted Research Environments (TREs). To characterize the advantages and drawbacks of different TRE attributes in facilitating cross-cohort analysis, we conduct a Genome-Wide Association Study of standard lipid measures using two approaches: meta-analysis and pooled analysis. Comparison of full summary data from both approaches with an external study shows strong correlation of known loci with lipid levels (R ~ 83-97%).
View Article and Find Full Text PDFThe Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in , 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers.
View Article and Find Full Text PDFMachine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devised and trained an ML-based phenotype to identify patients highly probable to have Long COVID.
View Article and Find Full Text PDFObjective: The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels.
Methods: At the time of the study, the resource consisted of 329 084 participants.
The National Institutes of Health's (NIH) All of Us Research Program aims to enroll at least one million US participants from diverse backgrounds; collect electronic health record (EHR) data, survey data, physical measurements, biospecimens for genomics and other assays, and digital health data; and create a researcher database and tools to enable precision medicine research [1]. Since inception, digital health technologies (DHT) have been envisioned as essential to achieving the goals of the program [2]. A "bring your own device" (BYOD) study for collecting Fitbit data from participants' devices was developed with integration of additional DHTs planned in the future [3].
View Article and Find Full Text PDFThe Research Program seeks to engage at least one million diverse participants to advance precision medicine and improve human health. We describe here the cloud-based Researcher Workbench that uses a data passport model to democratize access to analytical tools and participant information including survey, physical measurement, and electronic health record (EHR) data. We also present validation study findings for several common complex diseases to demonstrate use of this novel platform in 315,000 participants, 78% of whom are from groups historically underrepresented in biomedical research, including 49% self-reporting non-White races.
View Article and Find Full Text PDFThe field of artificial intelligence (AI) in medical imaging is undergoing explosive growth, and Radiology is a prime target for innovation. The American College of Radiology Data Science Institute has identified more than 240 specific use cases where AI could be used to improve clinical practice. In this context, thousands of potential methods are developed by research labs and industry innovators.
View Article and Find Full Text PDFResistant hypertension is defined as high blood pressure that remains above treatment goals in spite of the concurrent use of three antihypertensive agents from different classes. Despite the important health consequences of resistant hypertension, few studies of resistant hypertension have been conducted. To perform a genome-wide association study for resistant hypertension, we defined and identified cases of resistant hypertension and hypertensives with treated, controlled hypertension among >47,500 adults residing in the US linked to electronic health records (EHRs) and genotyped as part of the electronic MEdical Records & GEnomics (eMERGE) Network.
View Article and Find Full Text PDFBackground: As biobanks play an increasing role in the genomic research that will lead to precision medicine, input from diverse and large populations of patients in a variety of health care settings will be important in order to successfully carry out such studies. One important topic is participants' views towards consent and data sharing, especially since the 2011 Advanced Notice of Proposed Rulemaking (ANPRM), and subsequently the 2015 Notice of Proposed Rulemaking (NPRM) were issued by the Department of Health and Human Services (HHS) and Office of Science and Technology Policy (OSTP). These notices required that participants consent to research uses of their de-identified tissue samples and most clinical data, and allowing such consent be obtained in a one-time, open-ended or "broad" fashion.
View Article and Find Full Text PDFObjective: Cohort selection is challenging for large-scale electronic health record (EHR) analyses, as International Classification of Diseases 9th edition (ICD-9) diagnostic codes are notoriously unreliable disease predictors. Our objective was to develop, evaluate, and validate an automated algorithm for determining an Autism Spectrum Disorder (ASD) patient cohort from EHR. We demonstrate its utility via the largest investigation to date of the co-occurrence patterns of medical comorbidities in ASD.
View Article and Find Full Text PDFObjective: Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.
View Article and Find Full Text PDFObjectives: We describe the development, implementation, and evaluation of a model to pre-emptively select patients for genotyping based on medication exposure risk.
Study Design And Setting: Using deidentified electronic health records, we derived a prognostic model for the prescription of statins, warfarin, or clopidogrel. The model was implemented into a clinical decision support (CDS) tool to recommend pre-emptive genotyping for patients exceeding a prescription risk threshold.
Glucocorticoids are important therapy for acute lymphoblastic leukemia (ALL) and their major adverse effect is osteonecrosis. Our goal was to identify genetic and nongenetic risk factors for osteonecrosis. We performed a genome-wide association study of single nucleotide polymorphisms (SNPs) in a discovery cohort comprising 2285 children with ALL, treated on the Children's Oncology Group AALL0232 protocol (NCT00075725), adjusting for covariates.
View Article and Find Full Text PDFEvol Comput Mach Learn Data Min Bioinform
January 2014
The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network.
View Article and Find Full Text PDFThyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities.
View Article and Find Full Text PDFThe Mid-South Clinical Data Research Network (CDRN) encompasses three large health systems: (1) Vanderbilt Health System (VU) with electronic medical records for over 2 million patients, (2) the Vanderbilt Healthcare Affiliated Network (VHAN) which currently includes over 40 hospitals, hundreds of ambulatory practices, and over 3 million patients in the Mid-South, and (3) Greenway Medical Technologies, with access to 24 million patients nationally. Initial goals of the Mid-South CDRN include: (1) expansion of our VU data network to include the VHAN and Greenway systems, (2) developing data integration/interoperability across the three systems, (3) improving our current tools for extracting clinical data, (4) optimization of tools for collection of patient-reported data, and (5) expansion of clinical decision support. By 18 months, we anticipate our CDRN will robustly support projects in comparative effectiveness research, pragmatic clinical trials, and other key research areas and have the capacity to share data and health information technology tools nationally.
View Article and Find Full Text PDFType 2 diabetes (T2D) is a complex metabolic disease that disproportionately affects African Americans. Genome-wide association studies (GWAS) have identified several loci that contribute to T2D in European Americans, but few studies have been performed in admixed populations. We first performed a GWAS of 1,563 African Americans from the Vanderbilt Genome-Electronic Records Project and Northwestern University NUgene Project as part of the electronic Medical Records and Genomics (eMERGE) network.
View Article and Find Full Text PDFThe last decade has seen an exponential growth in the quantity of clinical data collected nationwide, triggering an increase in opportunities to reuse the data for biomedical research. The Vanderbilt research data warehouse framework consists of identified and de-identified clinical data repositories, fee-for-service custom services, and tools built atop the data layer to assist researchers across the enterprise. Providing resources dedicated to research initiatives benefits not only the research community, but also clinicians, patients and institutional leadership.
View Article and Find Full Text PDFThe 61 CTSA Consortium sites are home to valuable programs and infrastructure supporting translational science and all are charged with ensuring that such investments translate quickly to improved clinical care. Catalog of Assets for Translational and Clinical Health Research (CATCHR) is the Consortium's effort to collect and make available information on programs and resources to maximize efficiency and facilitate collaborations. By capturing information on a broad range of assets supporting the entire clinical and translational research spectrum, CATCHR aims to provide the necessary infrastructure and processes to establish and maintain an open-access, searchable database of consortium resources to support multisite clinical and translational research studies.
View Article and Find Full Text PDFCandidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry.
View Article and Find Full Text PDFThe Electronic Medical Records and Genomics Network is a National Human Genome Research Institute-funded consortium engaged in the development of methods and best practices for using the electronic medical record as a tool for genomic research. Now in its sixth year and second funding cycle, and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from electronic medical records can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and health-care informatics, particularly for electronic phenotyping, genome-wide association studies, genomic medicine implementation, and the ethical and regulatory issues associated with genomics research and returning results to study participants.
View Article and Find Full Text PDFBackground: The ADME Core Panel assays 184 variants across 34 pharmacogenes, many of which are difficult to accurately genotype with standard multiplexing methods.
Methods: We genotyped 326 frequently medicated individuals of European descent in Vanderbilt's biorepository linked to de-identified electronic medical records, BioVU, on the ADME Core Panel to assess quality and performance of the assay. We compared quality control metrics and determined the extent of direct and indirect marker overlap between the ADME Core Panel and the Illumina Omni1-Quad.