Publications by authors named "Tianxi Cai"

Background: B-cell depletion (BCD) therapies ( ocrelizumab, ofatumumab, rituximab) and natalizumab (NTZ) are highly effective disease-modifying therapies (DMTs) for multiple sclerosis (MS). However, no randomized clinical trial and only limited observational studies compared the two DMT classes.

Objective: We compared BCD and NTZ in managing MS patient-reported disability progression using registry-linked electronic healthcare record (EHR) data.

View Article and Find Full Text PDF

Mixture Markov Model (MMM) is a widely used tool to cluster sequences of events coming from a finite state-space. However, the MMM likelihood being multi-modal, the challenge remains in its maximization. Although Expectation-Maximization (EM) algorithm remains one of the most popular ways to estimate the MMM parameters, however, convergence of EM algorithm is not always guaranteed.

View Article and Find Full Text PDF

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.

View Article and Find Full Text PDF

Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult.

View Article and Find Full Text PDF

Background: People with Alzheimer's disease (AD) exhibit varying clinical trajectories. There is a need to predict future AD-related outcomes such as morbidity and mortality using clinical profile at the point of care.

Objective: To stratify AD patients based on baseline clinical profiles (up to two years prior to AD diagnosis) and update the model after AD diagnosis to prognosticate future AD-related outcomes.

View Article and Find Full Text PDF

Motivation: The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data.

View Article and Find Full Text PDF

Objective: Patients with rheumatoid arthritis (RA) are at increased risk of cardiovascular disease (CVD) including heart failure (HF). However, little is known regarding the relative risks of HF subtypes such as HF with preserved ejection fraction (HFpEF) or reduced ejection fraction (HFrEF) in RA compared with non-RA.

Methods: We identified patients with RA and matched non-RA comparators among participants consenting to broad research from two large academic centers.

View Article and Find Full Text PDF

Electronic Health Record (EHR) systems are particularly valuable in pediatrics due to high barriers in clinical studies, but pediatric EHR data often suffer from low content density. Existing EHR code embeddings tailored for the general patient population fail to address the unique needs of pediatric patients. To bridge this gap, we introduce a transfer learning approach, MUltisource Graph Synthesis (MUGS), aimed at accurate knowledge extraction and relation detection in pediatric contexts.

View Article and Find Full Text PDF
Article Synopsis
  • Type 2 diabetes mellitus (T2DM) can lead to serious complications like heart failure (HF), yet there's limited research on how insulin treatment compares to other medications in terms of HF risk.
  • This study utilized real-world insurance claims data to assess the long-term HF risk of insulin therapy versus other glucose-lowering medications like GLP-1 RAs, DPP-4Is, and SGLT2Is among T2DM patients.
  • Results indicated that insulin users had significantly higher 5-year incident HF rates compared to patients using other therapies, especially in those already at high risk for heart failure.
View Article and Find Full Text PDF
Article Synopsis
  • Limited representation of minorities in clinical research hinders the effectiveness of precision medicine, leading to underperformance in prediction models for these groups.
  • FETA is a proposed method that uses federated transfer learning to integrate diverse data from various healthcare institutions, aiming to improve genetic risk prediction models for underrepresented populations with small sample sizes.
  • Testing results show that FETA achieves better predictive performance compared to traditional methods, demonstrating its potential to enhance accuracy and address health disparities across different populations.
View Article and Find Full Text PDF
Article Synopsis
  • Human genetic studies often lack diversity, which limits understanding of disease causes and health disparities.
  • The Department of Veterans Affairs Million Veteran Program analyzed data from a diverse group of 635,969 veterans, revealing 13,672 genomic risk loci, with significant findings particularly from non-European populations.
  • The research identified causal variants across 613 traits, showing that genetic similarities exist across populations and emphasizing the importance of including underrepresented groups in genetic research.
View Article and Find Full Text PDF

Background: Risk prediction plays a crucial role in planning for prevention, monitoring, and treatment. Electronic Health Records (EHRs) offer an expansive repository of temporal medical data encompassing both risk factors and outcome indicators essential for effective risk prediction. However, challenges emerge due to the lack of readily available gold-standard outcomes and the complex effects of various risk factors.

View Article and Find Full Text PDF

Objective: Intracranial aneurysms (IA) and aortic aneurysms (AA) are both abnormal dilations of arteries with familial predisposition and have been proposed to share co-prevalence and pathophysiology. Associations of IA and non-aortic peripheral aneurysms are less well-studied. The goal of the study was to understand the patterns of aortic and peripheral (extracranial) aneurysms in patients with IA, and risk factors associated with the development of these aneurysms.

View Article and Find Full Text PDF

Determining whether a surrogate marker can be used to replace a primary outcome in a clinical study is complex. While many statistical methods have been developed to formally evaluate a surrogate marker, they generally do not provide a way to examine heterogeneity in the utility of a surrogate marker. Similar to treatment effect heterogeneity, where the effect of a treatment varies based on a patient characteristic, heterogeneity in surrogacy means that the strength or utility of the surrogate marker varies based on a patient characteristic.

View Article and Find Full Text PDF

When the primary endpoints in randomized clinical trials require long term follow-up or are costly to measure, it is often desirable to assess treatment effects on surrogate instead of clinical endpoints. Prior to adopting a surrogate endpoint for such purposes, the extent of its surrogacy on the primary endpoint must be assessed. There is a rich statistical literature on assessing surrogacy in the overall population, much of which is based on quantifying the proportion of treatment effect on the primary endpoint that is explained by the treatment effect on the surrogate endpoint.

View Article and Find Full Text PDF

Background: Post-COVID-19 condition (colloquially known as "long COVID-19") characterized as postacute sequelae of SARS-CoV-2 has no universal clinical case definition. Recent efforts have focused on understanding long COVID-19 symptoms, and electronic health record (EHR) data provide a unique resource for understanding this condition. The introduction of the International Classification of Diseases, Tenth Revision (ICD-10) code U09.

View Article and Find Full Text PDF

Few studies examining the patient outcomes of concurrent neurological manifestations during acute COVID-19 leveraged multinational cohorts of adults and children or distinguished between central and peripheral nervous system (CNS vs. PNS) involvement. Using a federated multinational network in which local clinicians and informatics experts curated the electronic health records data, we evaluated the risk of prolonged hospitalization and mortality in hospitalized COVID-19 patients from 21 healthcare systems across 7 countries.

View Article and Find Full Text PDF

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists.

View Article and Find Full Text PDF

Objective: Development of clinical phenotypes from electronic health records (EHRs) can be resource intensive. Several phenotype libraries have been created to facilitate reuse of definitions. However, these platforms vary in target audience and utility.

View Article and Find Full Text PDF

In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model.

View Article and Find Full Text PDF

In clinical studies of chronic diseases, the effectiveness of an intervention is often assessed using "high cost" outcomes that require long-term patient follow-up and/or are invasive to obtain. While much progress has been made in the development of statistical methods to identify surrogate markers, that is, measurements that could replace such costly outcomes, they are generally not applicable to studies with a small sample size. These methods either rely on nonparametric smoothing which requires a relatively large sample size or rely on strict model assumptions that are unlikely to hold in practice and empirically difficult to verify with a small sample size.

View Article and Find Full Text PDF

Several studies have shown that about 80% of the medical information in an electronic health record is only available through unstructured data. Resources such as medical terminologies in languages other than English are limited and restrain the NLP tools. We propose here to leverage English based resources in other languages using a combination of translation, word alignment, entity extraction and term normalization (TAXN).

View Article and Find Full Text PDF

Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning.

View Article and Find Full Text PDF

There have been increased concerns that the use of statins, one of the most commonly prescribed drugs for treating coronary artery disease, is potentially associated with the increased risk of new-onset Type II diabetes (T2D). Nevertheless, to date, there is no robust evidence supporting as to whether and what kind of populations are indeed vulnerable for developing T2D after taking statins. In this case study, leveraging the biobank and electronic health record data in the Partner Health System, we introduce a new data analysis pipeline and a novel statistical methodology that address existing limitations by (i) designing a rigorous causal framework that systematically examines the causal effects of statin usage on T2D risk in observational data, (ii) uncovering which patient subgroup is most vulnerable for developing T2D after taking statins, and (iii) assessing the replicability and statistical significance of the most vulnerable subgroup via a bootstrap calibration procedure.

View Article and Find Full Text PDF

A PHP Error was encountered

Severity: Warning

Message: fopen(/var/lib/php/sessions/ci_sessionjkd0utsn898dg3rr7hcfoj8vh0kjt1ei): Failed to open stream: No space left on device

Filename: drivers/Session_files_driver.php

Line Number: 177

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once

A PHP Error was encountered

Severity: Warning

Message: session_start(): Failed to read session data: user (path: /var/lib/php/sessions)

Filename: Session/Session.php

Line Number: 137

Backtrace:

File: /var/www/html/index.php
Line: 316
Function: require_once