Publications by authors named "Palchuk M"

Electronic health records (EHRs), though they are maintained and utilized for clinical and billing purposes, may provide a wealth of information for research. Currently, sources are available that offer insight into the health histories of well over a quarter of a billion people. Their use, however, is fraught with hazards, including introduction or reinforcement of biases, clarity of disease definitions, protection of patient privacy, definitions of covariates or confounders, accuracy of medication usage compared with prescriptions, the need to introduce other data sources such as vaccination or death records and the ensuing potential for inaccuracy, duplicative records, and understanding and interpreting the outcomes of data queries.

View Article and Find Full Text PDF

Introduction: To support long COVID research in National COVID Cohort Collaborative (N3C), the N3C Phenotype and Data Acquisition team created data designs to aid contributing sites in enhancing their data. Enhancements include: long COVID specialty clinic indicator; Admission, Discharge, and Transfer (ADT) transactions; patient-level social determinants of health; and in-hospital use of oxygen supplementation.

Methods: For each enhancement, we defined the scope and wrote guidance on how to prepare and populate the data in a standardized way.

View Article and Find Full Text PDF

Objective: The primary aim of this study is to address the critical issue of non-standardized units in clinical laboratory data, which poses significant challenges to data interoperability and secondary usage. Despite UCUM (Unified Code for Units of Measure) offering a unique representation for laboratory test units, nearly 60% of laboratory codes in healthcare organizations use non-standard units. We sought to design, implement and test a methodology for the harmonization of units to the UCUM standards across a large research network.

View Article and Find Full Text PDF

Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC).

View Article and Find Full Text PDF

Objective: Clinical research networks facilitate collaborative research, but data sharing remains a common barrier.

Materials And Methods: The TriNetX platform provides real-time access to electronic health record (EHR)-derived, anonymized data from 173 healthcare organizations (HCOs) and tools for queries and analysis. In 2022, 4 pediatric HCOs worked with TriNetX leadership to found the Pediatric Collaboratory Network (PCN), facilitated via a multi-institutional data-use agreement (DUA).

View Article and Find Full Text PDF
Article Synopsis
  • A study investigated the prevalence of vestibular disorders in patients with COVID-19 compared to those without the virus using data from the National COVID Cohort Collaborative database.
  • Results showed that individuals with COVID-19 were significantly more likely to experience vestibular disorders, with the highest risk associated with the omicron 23A variant (OR of 8.80).
  • The findings underscore the need for further research on the long-term effects of vestibular disorders in COVID-19 patients and implications for patient counseling.
View Article and Find Full Text PDF

Background: Pancreatic Duct Adenocarcinoma (PDAC) screening can enable early-stage disease detection and long-term survival. Current guidelines use inherited predisposition, with about 10% of PDAC cases eligible for screening. Using Electronic Health Record (EHR) data from a multi-institutional federated network, we developed and validated a PDAC RISk Model (Prism) for the general US population to extend early PDAC detection.

View Article and Find Full Text PDF

Purpose: To explore medications and their administration patterns in real-world patients with breast cancer.

Methods: A retrospective study was performed using TriNetX, a federated network of deidentified, Health Insurance Portability and Accountability Act-compliant data from 21 health care organizations across North America. Patients diagnosed with breast cancer between January 1, 2013, and May 31, 2022, were included.

View Article and Find Full Text PDF

Objectives: Analysis of health care real-world data (RWD) provides an opportunity to observe the actual patient diagnostic, treatment, and outcome events. However, researchers should understand the possible limitations of RWD. In particular, the dates in these data may be shifted from their actual values, which might affect the validity of study conclusions.

View Article and Find Full Text PDF

Laboratory data must be interoperable to be able to accurately compare the results of a lab test between healthcare organizations. To achieve this, terminologies like LOINC (Logical Observation Identifiers, Names and Codes) provide unique identification codes for laboratory tests. Once standardized, the numeric results of laboratory tests can be aggregated and represented in histograms.

View Article and Find Full Text PDF

Objective: This article describes a scalable, performant, sustainable global network of electronic health record data for biomedical and clinical research.

Materials And Methods: TriNetX has created a technology platform characterized by a conservative security and governance model that facilitates collaboration and cooperation between industry participants, such as pharmaceutical companies and contract research organizations, and academic and community-based healthcare organizations (HCOs). HCOs participate on the network in return for access to a suite of analytics capabilities, large networks of de-identified data, and more sponsored trial opportunities.

View Article and Find Full Text PDF

The availability of next-generation sequencing (NGS) technologies and their continually declining costs have resulted in the accumulation of large genomic data sets. NGS results have traditionally been delivered in PDF format, and in some cases, structured data, e.g.

View Article and Find Full Text PDF

Including social determinants of health (SDoH) data in health outcomes research is essential for studying the sources of healthcare disparities and developing strategies to mitigate stressors. In this report, we describe a pragmatic design and approach to explore the encoding needs for transmitting SDoH screening tool responses from a large safety-net hospital into the National Covid Cohort Collaborative (N3C) OMOP dataset. We provide a stepwise account of designing data mapping and ingestion for patient-level SDoH and summarize the results of screening.

View Article and Find Full Text PDF

Reuse of Electronic Health Records (EHRs) for specific diseases such as COVID-19 requires data to be recorded and persisted according to international standards. Since the beginning of the COVID-19 pandemic, Hospital Universitario 12 de Octubre (H12O) evolved its EHRs: it identified, modeled and standardized the concepts related to this new disease in an agile, flexible and staged way. Thus, data from more than 200,000 COVID-19 cases were extracted, transformed, and loaded into an i2b2 repository.

View Article and Find Full Text PDF

Purpose: This is an update to a previously published report characterizing the impact that efforts to control the COVID-19 pandemic have had on the normal course of cancer-related encounters.

Methods: Data were analyzed from 22 US health care organizations (members of the TriNetX global network) having relevant, up-to-date encounter data. Although the original study compared encounter data pre-COVID-19 (January-April 2019) with the corresponding months in 2020, this update considers data through April 2021.

View Article and Find Full Text PDF

Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g.

View Article and Find Full Text PDF

Objective: In response to COVID-19, the informatics community united to aggregate as much clinical data as possible to characterize this new disease and reduce its impact through collaborative analytics. The National COVID Cohort Collaborative (N3C) is now the largest publicly available HIPAA limited dataset in US history with over 6.4 million patients and is a testament to a partnership of over 100 organizations.

View Article and Find Full Text PDF
Article Synopsis
  • - The National COVID Cohort Collaborative (N3C) is a massive electronic health record database that provides valuable insights into COVID-19, supporting the development of better diagnostic tools and clinical practices.
  • - This study analyzed data from nearly 2 million adults across 34 medical centers to evaluate the severity of COVID-19 and its risk factors over time, using advanced machine learning techniques to predict severe outcomes.
  • - Among the 174,568 adults infected with SARS-CoV-2, a significant portion experienced severe illness, highlighting the need for continuous monitoring and adjustment of treatment approaches based on demographic characteristics and disease severity.
View Article and Find Full Text PDF

Objective: Analysis of healthcare Real-World Data (RWD) provides an opportunity to observe actual patient diagnostic, treatment and outcomes events. However, researchers should understand the possible limitations of RWD. In particular, these data may be incomplete, which would affect the validity of study conclusions.

View Article and Find Full Text PDF
Article Synopsis
  • The National COVID Cohort Collaborative (N3C) is the largest U.S. COVID-19 patient database, created to provide a comprehensive analysis of clinical characteristics, disease progression, and treatment outcomes across multiple health centers, enhancing predictive and diagnostic tools for COVID-19.
  • A study involving over 1.9 million patients from 34 medical centers found significant clinical data, showing that certain factors like age, sex, and underlying conditions affect disease severity, with a notable decrease in mortality rates among hospitalized patients over time.
  • The N3C dataset was utilized in machine learning models to successfully predict severe outcomes in COVID-19 patients, achieving high accuracy rates and demonstrating the potential of using electronic health
View Article and Find Full Text PDF

Defining patient-to-patient similarity is essential for the development of precision medicine in clinical care and research. Conceptually, the identification of similar patient cohorts appears straightforward; however, universally accepted definitions remain elusive. Simultaneously, an explosion of vendors and published algorithms have emerged and all provide varied levels of functionality in identifying patient similarity categories.

View Article and Find Full Text PDF

Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization.

View Article and Find Full Text PDF

Mappings among terminologies to ensure homogeneous analysis among different data sources is one of the key challenges of semantic interoperability. Concretely, mappings to the International Classification of Diseases 10th Revision Procedure Classification System (ICD-10-PCS) are especially challenging due to its multiaxial structure and lack of terms used by physicians (many terminologies used in real world data (RWD) are initially intended for reimbursement, not for clinical purposes). In this work, we propose a new theoretical methodology for mapping healthcare data to the ICD-10-PCS by exploiting its multiaxial structure to reduce the search spaces within concepts and leveraging the dependencies between axes for inferring additional relevant information.

View Article and Find Full Text PDF

Purpose: While there are studies under way to characterize the direct effects of the COVID-19 pandemic on the care of patients with cancer, there have been few quantitative reports of the impact that efforts to control the pandemic have had on the normal course of cancer diagnosis and treatment encounters.

Methods: We used the TriNetX platform to analyze 20 health care institutions that have relevant, up-to-date encounter data. Using this COVID and Cancer Research Network (CCRN), we compared cancer cohorts identified by querying encounter data pre-COVID (January 2019-April 2019) and current (January 2020-April 2020).

View Article and Find Full Text PDF