Background: B-cell depletion (BCD) therapies ( ocrelizumab, ofatumumab, rituximab) and natalizumab (NTZ) are highly effective disease-modifying therapies (DMTs) for multiple sclerosis (MS). However, no randomized clinical trial and only limited observational studies compared the two DMT classes.
Objective: We compared BCD and NTZ in managing MS patient-reported disability progression using registry-linked electronic healthcare record (EHR) data.
Mixture Markov Model (MMM) is a widely used tool to cluster sequences of events coming from a finite state-space. However, the MMM likelihood being multi-modal, the challenge remains in its maximization. Although Expectation-Maximization (EM) algorithm remains one of the most popular ways to estimate the MMM parameters, however, convergence of EM algorithm is not always guaranteed.
View Article and Find Full Text PDFObjective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.
View Article and Find Full Text PDFBackground: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.
View Article and Find Full Text PDFBackground: People with Alzheimer's disease (AD) exhibit varying clinical trajectories. There is a need to predict future AD-related outcomes such as morbidity and mortality using clinical profile at the point of care.
Objective: To stratify AD patients based on baseline clinical profiles (up to two years prior to AD diagnosis) and update the model after AD diagnosis to prognosticate future AD-related outcomes.
J Biomed Inform
February 2025
Motivation: The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data.
View Article and Find Full Text PDFArthritis Care Res (Hoboken)
December 2024
Objective: Patients with rheumatoid arthritis (RA) are at increased risk of cardiovascular disease (CVD) including heart failure (HF). However, little is known regarding the relative risks of HF subtypes such as HF with preserved ejection fraction (HFpEF) or reduced ejection fraction (HFrEF) in RA compared with non-RA.
Methods: We identified patients with RA and matched non-RA comparators among participants consenting to broad research from two large academic centers.
Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MUltisource Graph Synthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts.
View Article and Find Full Text PDFBackground: Risk prediction plays a crucial role in planning for prevention, monitoring, and treatment. Electronic Health Records (EHRs) offer an expansive repository of temporal medical data encompassing both risk factors and outcome indicators essential for effective risk prediction. However, challenges emerge due to the lack of readily available gold-standard outcomes and the complex effects of various risk factors.
View Article and Find Full Text PDFObjective: Intracranial aneurysms (IA) and aortic aneurysms (AA) are both abnormal dilations of arteries with familial predisposition and have been proposed to share co-prevalence and pathophysiology. Associations of IA and non-aortic peripheral aneurysms are less well-studied. The goal of the study was to understand the patterns of aortic and peripheral (extracranial) aneurysms in patients with IA, and risk factors associated with the development of these aneurysms.
View Article and Find Full Text PDFDetermining whether a surrogate marker can be used to replace a primary outcome in a clinical study is complex. While many statistical methods have been developed to formally evaluate a surrogate marker, they generally do not provide a way to examine heterogeneity in the utility of a surrogate marker. Similar to treatment effect heterogeneity, where the effect of a treatment varies based on a patient characteristic, heterogeneity in surrogacy means that the strength or utility of the surrogate marker varies based on a patient characteristic.
View Article and Find Full Text PDFStat Methods Med Res
July 2024
When the primary endpoints in randomized clinical trials require long term follow-up or are costly to measure, it is often desirable to assess treatment effects on surrogate instead of clinical endpoints. Prior to adopting a surrogate endpoint for such purposes, the extent of its surrogacy on the primary endpoint must be assessed. There is a rich statistical literature on assessing surrogacy in the overall population, much of which is based on quantifying the proportion of treatment effect on the primary endpoint that is explained by the treatment effect on the surrogate endpoint.
View Article and Find Full Text PDFOnline J Public Health Inform
May 2024
Background: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.
View Article and Find Full Text PDFFew studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries.
View Article and Find Full Text PDFThe Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists.
View Article and Find Full Text PDFObjective: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility.
View Article and Find Full Text PDFIn many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model.
View Article and Find Full Text PDFIn clinical studies of chronic diseases, the effectiveness of an intervention is often assessed using "high cost" outcomes that require long-term patient follow-up and/or are invasive to obtain. While much progress has been made in the development of statistical methods to identify surrogate markers, that is, measurements that could replace such costly outcomes, they are generally not applicable to studies with a small sample size. These methods either rely on nonparametric smoothing which requires a relatively large sample size or rely on strict model assumptions that are unlikely to hold in practice and empirically difficult to verify with a small sample size.
View Article and Find Full Text PDFStud Health Technol Inform
January 2024
Several studies have shown that about 80% of the medical information in an electronic health record is only available through unstructured data. Resources such as medical terminologies in languages other than English are limited and restrain the NLP tools. We propose here to leverage English based resources in other languages using a combination of translation, word alignment, entity extraction and term normalization (TAXN).
View Article and Find Full Text PDFElectronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning.
View Article and Find Full Text PDFThere have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset Type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for developing T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure.
View Article and Find Full Text PDF