Background: Previous epidemiologic studies of autoimmune diseases in the United States (US) have included a limited number of diseases or used meta-analyses that rely on different data collection methods and analyses for each disease.
Methods: To estimate the prevalence of autoimmune diseases in the US, we used electronic health record data from six large medical systems in the US. We developed a software program using common methodology to compute the estimated prevalence of autoimmune diseases alone and in aggregate that can be readily used by other investigators to replicate or modify the analysis over time.
Background: Scalable identification of patients with post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms, which has led to suboptimal accuracy, demographic biases, and underestimation of the PASC.
Methods: In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying cohorts of patients with PASC. We used longitudinal electronic health records data from over 295,000 patients from 14 hospitals and 20 community health centers in Massachusetts.
Patients with chronic lymphocytic leukemia (CLL) and non-Hodgkin lymphoma (NHL) can develop hypogammaglobulinemia, a form of secondary immune deficiency (SID), from the disease and treatments. Patients with hypogammaglobulinemia with recurrent infections may benefit from immunoglobulin replacement therapy (IgRT). This study evaluated patterns of immunoglobulin G (IgG) testing and the effectiveness of IgRT in real-world patients with CLL or NHL.
View Article and Find Full Text PDFObjective: Intracranial aneurysms (IA) and aortic aneurysms (AA) are both abnormal dilations of arteries with familial predisposition and have been proposed to share co-prevalence and pathophysiology. Associations of IA and non-aortic peripheral aneurysms are less well-studied. The goal of the study was to understand the patterns of aortic and peripheral (extracranial) aneurysms in patients with IA, and risk factors associated with the development of these aneurysms.
View Article and Find Full Text PDFScalable identification of patients with the post-acute sequelae of COVID-19 (PASC) is challenging due to a lack of reproducible precision phenotyping algorithms and the suboptimal accuracy, demographic biases, and underestimation of the PASC diagnosis code (ICD-10 U09.9). In a retrospective case-control study, we developed a precision phenotyping algorithm for identifying research cohorts of PASC patients, defined as a diagnosis of exclusion.
View Article and Find Full Text PDFBackground: Characterizing Post-Acute Sequelae of COVID (SARS-CoV-2 Infection), or has been challenging due to the multitude of sub-phenotypes, temporal attributes, and definitions. Scalable characterization of PASC sub-phenotypes can enhance screening capacities, disease management, and treatment planning.
Methods: We conducted a retrospective multi-centre observational cohort study, leveraging longitudinal electronic health record (EHR) data of 30,422 patients from three healthcare systems in the Consortium for the Clinical Characterization of COVID-19 by EHR (4CE).
Patients with chronic obstructive pulmonary disease (COPD) and type 2 diabetes (T2D) have worse clinical outcomes compared with patients without metabolic dysregulation. GLP-1 (glucagon-like peptide 1) receptor agonists (GLP-1RAs) reduce asthma exacerbation risk and improve FVC in patients with COPD. To determine whether GLP-1RA use is associated with reduced COPD exacerbation rates, and severe and moderate exacerbation risk, compared with other T2D therapies.
View Article and Find Full Text PDFObjective: Patients who receive most care within a single healthcare system (colloquially called a "loyalty cohort" since they typically return to the same providers) have mostly complete data within that organization's electronic health record (EHR). Loyalty cohorts have low data missingness, which can unintentionally bias research results. Using proxies of routine care and healthcare utilization metrics, we compute a per-patient score that identifies a loyalty cohort.
View Article and Find Full Text PDFPLOS Digit Health
July 2023
Physical and psychological symptoms lasting months following an acute COVID-19 infection are now recognized as post-acute sequelae of COVID-19 (PASC). Accurate tools for identifying such patients could enhance screening capabilities for the recruitment for clinical trials, improve the reliability of disease estimates, and allow for more accurate downstream cohort analysis. In this retrospective cohort study, we analyzed the EHR of hospitalized COVID-19 patients across three healthcare systems to develop a pipeline for better identifying patients with persistent PASC symptoms (dyspnea, fatigue, or joint pain) after their SARS-CoV-2 infection.
View Article and Find Full Text PDFImportance: SARS-CoV-2 infection is associated with persistent, relapsing, or new symptoms or other health effects occurring after acute infection, termed postacute sequelae of SARS-CoV-2 infection (PASC), also known as long COVID. Characterizing PASC requires analysis of prospectively and uniformly collected data from diverse uninfected and infected individuals.
Objective: To develop a definition of PASC using self-reported symptoms and describe PASC frequencies across cohorts, vaccination status, and number of infections.
Background: Alzheimer's Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls on an ageing global population. Real-world data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological research on AD. A precise characterization of AD cohorts is needed to address the noise abundant in RWD.
View Article and Find Full Text PDFBackground: In electronic health records, patterns of missing laboratory test results could capture patients' course of disease as well as reflect clinician's concerns or worries for possible conditions. These patterns are often understudied and overlooked. This study aims to identify informative patterns of missingness among laboratory data collected across 15 healthcare system sites in three countries for COVID-19 inpatients.
View Article and Find Full Text PDFObjective: High BMI is associated with many comorbidities and mortality. This study aimed to elucidate the overall clinical risk of obesity using a genome- and phenome-wide approach.
Methods: This study performed a phenome-wide association study of BMI using a clinical cohort of 736,726 adults.
Importance: The SARS-CoV-2 Omicron subvariant, BA.2, may be less severe than previous variants; however, confounding factors make interpreting the intrinsic severity challenging.
Objective: To compare the adjusted risks of mortality, hospitalization, intensive care unit admission, and invasive ventilation between the BA.
Motivation: The i2b2 platform is used at major academic health institutions and research consortia for querying for electronic health data. However, a major obstacle for wider utilization of the platform is the complexity of data loading that entails a steep curve of learning the platform's complex data schemas. To address this problem, we have developed the i2b2-etl package that simplifies the data loading process, which will facilitate wider deployment and utilization of the platform.
View Article and Find Full Text PDFObjective: For multi-center heterogeneous Real-World Data (RWD) with time-to-event outcomes and high-dimensional features, we propose the SurvMaximin algorithm to estimate Cox model feature coefficients for a target population by borrowing summary information from a set of health care centers without sharing patient-level information.
Materials And Methods: For each of the centers from which we want to borrow information to improve the prediction performance for the target population, a penalized Cox model is fitted to estimate feature coefficients for the center. Using estimated feature coefficients and the covariance matrix of the target population, we then obtain a SurvMaximin estimated set of feature coefficients for the target population.
Objective: This study aimed is to: (1) extend the Integrating the Biology and the Bedside (i2b2) data and application models to include medical imaging appropriate use criteria, enabling it to serve as a platform to monitor local impact of the Protecting Access to Medicare Act's (PAMA) imaging clinical decision support (CDS) requirements, and (2) validate the i2b2 extension using data from the Medicare Imaging Demonstration (MID) CDS implementation.
Materials And Methods: This study provided a reference implementation and assessed its validity and reliability using data from the MID, the federal government's predecessor to PAMA's imaging CDS program. The Star Schema was extended to describe the interactions of imaging ordering providers with the CDS.
Background Models predicting atrial fibrillation (AF) risk, such as Cohorts for Heart and Aging Research in Genomic Epidemiology AF (CHARGE-AF), have not performed as well in electronic health records. Natural language processing (NLP) may improve models by using narrative electronic health record text. Methods and Results From a primary care network, we included patients aged ≥65 years with visits between 2003 and 2013 in development (n=32 960) and internal validation cohorts (n=13 992).
View Article and Find Full Text PDFAnalysis of health data typically requires development of queries using structured query language (SQL) by a data-analyst. As the SQL queries are manually created, they are prone to errors. In addition, accurate implementation of the queries depends on effective communication with clinical experts, that further makes the analysis error prone.
View Article and Find Full Text PDFObjective: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems.
View Article and Find Full Text PDFThe risk profiles of post-acute sequelae of COVID-19 (PASC) have not been well characterized in multi-national settings with appropriate controls. We leveraged electronic health record (EHR) data from 277 international hospitals representing 414,602 patients with COVID-19, 2.3 million control patients without COVID-19 in the inpatient and outpatient settings, and over 221 million diagnosis codes to systematically identify new-onset conditions enriched among patients with COVID-19 during the post-acute period.
View Article and Find Full Text PDFGiven the growing number of prediction algorithms developed to predict COVID-19 mortality, we evaluated the transportability of a mortality prediction algorithm using a multi-national network of healthcare systems. We predicted COVID-19 mortality using baseline commonly measured laboratory values and standard demographic and clinical covariates across healthcare systems, countries, and continents. Specifically, we trained a Cox regression model with nine measured laboratory test values, standard demographics at admission, and comorbidity burden pre-admission.
View Article and Find Full Text PDF