Publications by authors named "Matvey Palchuk"

Introduction: To support long COVID research in National COVID Cohort Collaborative (N3C), the N3C Phenotype and Data Acquisition team created data designs to aid contributing sites in enhancing their data. Enhancements include: long COVID specialty clinic indicator; Admission, Discharge, and Transfer (ADT) transactions; patient-level social determinants of health; and in-hospital use of oxygen supplementation.

Methods: For each enhancement, we defined the scope and wrote guidance on how to prepare and populate the data in a standardized way.

View Article and Find Full Text PDF

Objective: The primary aim of this study is to address the critical issue of non-standardized units in clinical laboratory data, which poses significant challenges to data interoperability and secondary usage. Despite UCUM (Unified Code for Units of Measure) offering a unique representation for laboratory test units, nearly 60% of laboratory codes in healthcare organizations use non-standard units. We sought to design, implement and test a methodology for the harmonization of units to the UCUM standards across a large research network.

View Article and Find Full Text PDF

Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC).

View Article and Find Full Text PDF

Objective: Clinical research networks facilitate collaborative research, but data sharing remains a common barrier.

Materials And Methods: The TriNetX platform provides real-time access to electronic health record (EHR)-derived, anonymized data from 173 healthcare organizations (HCOs) and tools for queries and analysis. In 2022, 4 pediatric HCOs worked with TriNetX leadership to found the Pediatric Collaboratory Network (PCN), facilitated via a multi-institutional data-use agreement (DUA).

View Article and Find Full Text PDF
Article Synopsis
  • A study investigated the prevalence of vestibular disorders in patients with COVID-19 compared to those without the virus using data from the National COVID Cohort Collaborative database.
  • Results showed that individuals with COVID-19 were significantly more likely to experience vestibular disorders, with the highest risk associated with the omicron 23A variant (OR of 8.80).
  • The findings underscore the need for further research on the long-term effects of vestibular disorders in COVID-19 patients and implications for patient counseling.
View Article and Find Full Text PDF

Background: Pancreatic Duct Adenocarcinoma (PDAC) screening can enable early-stage disease detection and long-term survival. Current guidelines use inherited predisposition, with about 10% of PDAC cases eligible for screening. Using Electronic Health Record (EHR) data from a multi-institutional federated network, we developed and validated a PDAC RISk Model (Prism) for the general US population to extend early PDAC detection.

View Article and Find Full Text PDF

Purpose: To explore medications and their administration patterns in real-world patients with breast cancer.

Methods: A retrospective study was performed using TriNetX, a federated network of deidentified, Health Insurance Portability and Accountability Act-compliant data from 21 health care organizations across North America. Patients diagnosed with breast cancer between January 1, 2013, and May 31, 2022, were included.

View Article and Find Full Text PDF

Objectives: Analysis of health care real-world data (RWD) provides an opportunity to observe the actual patient diagnostic, treatment, and outcome events. However, researchers should understand the possible limitations of RWD. In particular, the dates in these data may be shifted from their actual values, which might affect the validity of study conclusions.

View Article and Find Full Text PDF

Laboratory data must be interoperable to be able to accurately compare the results of a lab test between healthcare organizations. To achieve this, terminologies like LOINC (Logical Observation Identifiers, Names and Codes) provide unique identification codes for laboratory tests. Once standardized, the numeric results of laboratory tests can be aggregated and represented in histograms.

View Article and Find Full Text PDF

Objective: This article describes a scalable, performant, sustainable global network of electronic health record data for biomedical and clinical research.

Materials And Methods: TriNetX has created a technology platform characterized by a conservative security and governance model that facilitates collaboration and cooperation between industry participants, such as pharmaceutical companies and contract research organizations, and academic and community-based healthcare organizations (HCOs). HCOs participate on the network in return for access to a suite of analytics capabilities, large networks of de-identified data, and more sponsored trial opportunities.

View Article and Find Full Text PDF

The availability of next-generation sequencing (NGS) technologies and their continually declining costs have resulted in the accumulation of large genomic data sets. NGS results have traditionally been delivered in PDF format, and in some cases, structured data, e.g.

View Article and Find Full Text PDF

Including social determinants of health (SDoH) data in health outcomes research is essential for studying the sources of healthcare disparities and developing strategies to mitigate stressors. In this report, we describe a pragmatic design and approach to explore the encoding needs for transmitting SDoH screening tool responses from a large safety-net hospital into the National Covid Cohort Collaborative (N3C) OMOP dataset. We provide a stepwise account of designing data mapping and ingestion for patient-level SDoH and summarize the results of screening.

View Article and Find Full Text PDF

Reuse of Electronic Health Records (EHRs) for specific diseases such as COVID-19 requires data to be recorded and persisted according to international standards. Since the beginning of the COVID-19 pandemic, Hospital Universitario 12 de Octubre (H12O) evolved its EHRs: it identified, modeled and standardized the concepts related to this new disease in an agile, flexible and staged way. Thus, data from more than 200,000 COVID-19 cases were extracted, transformed, and loaded into an i2b2 repository.

View Article and Find Full Text PDF

Purpose: This is an update to a previously published report characterizing the impact that efforts to control the COVID-19 pandemic have had on the normal course of cancer-related encounters.

Methods: Data were analyzed from 22 US health care organizations (members of the TriNetX global network) having relevant, up-to-date encounter data. Although the original study compared encounter data pre-COVID-19 (January-April 2019) with the corresponding months in 2020, this update considers data through April 2021.

View Article and Find Full Text PDF

Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g.

View Article and Find Full Text PDF

Objective: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations.

View Article and Find Full Text PDF
Article Synopsis
  • - The National COVID Cohort Collaborative (N3C) is a massive electronic health record database that provides valuable insights into COVID-19, supporting the development of better diagnostic tools and clinical practices.
  • - This study analyzed data from nearly 2 million adults across 34 medical centers to evaluate the severity of COVID-19 and its risk factors over time, using advanced machine learning techniques to predict severe outcomes.
  • - Among the 174,568 adults infected with SARS-CoV-2, a significant portion experienced severe illness, highlighting the need for continuous monitoring and adjustment of treatment approaches based on demographic characteristics and disease severity.
View Article and Find Full Text PDF

Objective: Analysis of healthcare Real-World Data (RWD) provides an opportunity to observe actual patient diagnostic, treatment and outcomes events. However, researchers should understand the possible limitations of RWD. In particular, these data may be incomplete, which would affect the validity of study conclusions.

View Article and Find Full Text PDF
Article Synopsis
  • The National COVID Cohort Collaborative (N3C) is the largest U.S. COVID-19 patient database, created to provide a comprehensive analysis of clinical characteristics, disease progression, and treatment outcomes across multiple health centers, enhancing predictive and diagnostic tools for COVID-19.
  • A study involving over 1.9 million patients from 34 medical centers found significant clinical data, showing that certain factors like age, sex, and underlying conditions affect disease severity, with a notable decrease in mortality rates among hospitalized patients over time.
  • The N3C dataset was utilized in machine learning models to successfully predict severe outcomes in COVID-19 patients, achieving high accuracy rates and demonstrating the potential of using electronic health
View Article and Find Full Text PDF

Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and all provide varied levels of functionality in identifying patient similarity categories.

View Article and Find Full Text PDF

Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization.

View Article and Find Full Text PDF

Purpose: While there are studies under way to characterize the direct effects of the COVID-19 pandemic on the care of patients with cancer, there have been few quantitative reports of the impact that efforts to control the pandemic have had on the normal course of cancer diagnosis and treatment encounters.

Methods: We used the TriNetX platform to analyze 20 health care institutions that have relevant, up-to-date encounter data. Using this COVID and Cancer Research Network (CCRN), we compared cancer cohorts identified by querying encounter data pre-COVID (January 2019-April 2019) and current (January 2020-April 2020).

View Article and Find Full Text PDF

Clinical trials, whether industry, cooperative group sponsored, or investigator initiated, have an unacceptable rate of failure as a result of the inability to recruit sufficient numbers of patients. Even those trials that are completed often require time-consuming protocol amendments to achieve accrual goals. These inefficiencies in clinical trial research result in increasing costs and prolong the time needed to bring improved treatments to cancer clinical practice.

View Article and Find Full Text PDF

The tranSMART knowledge management and high-content analysis platform is a flexible software framework featuring novel research capabilities. It enables analysis of integrated data for the purposes of hypothesis generation, hypothesis validation, and cohort discovery in translational research. tranSMART bridges the prolific world of basic science and clinical practice data at the point of care by merging multiple types of data from disparate sources into a common environment.

View Article and Find Full Text PDF