Publications by authors named "Evan T Sholle"

Background: To achieve scientific goals, researchers often require integration of data from a primary electronic health record (EHR) system and one or more ancillary EHR systems used during the same patient care encounter. Although studies have demonstrated approaches for linking patient identity records across different EHR systems, little is known about linking patient encounter records across primary and ancillary EHR systems.

Objectives: We compared a patients-first approach versus an encounters-first approach for linking patient encounter records across multiple EHR systems.

View Article and Find Full Text PDF

Obtaining reliable data on patient mortality is a critical challenge facing observational researchers seeking to conduct studies using real-world data. As these analyses are conducted more broadly using newly-available sources of real-world evidence, missing data can serve as a rate-limiting factor. We conducted a comparison of mortality data sources from different stakeholder perspectives - academic medical center (AMC) informatics service providers, AMC research coordinators, industry analytics professionals, and academics - to understand the strengths and limitations of differing mortality data sources: locally generated data from sites conducting research, data provided by governmental sources, and commercially available data sets.

View Article and Find Full Text PDF

Background: A commercial federated network called TriNetX has connected electronic health record (EHR) data from academic medical centers (AMCs) with biopharmaceutical sponsors in a privacy-preserving manner to promote sponsor-initiated clinical trials. Little is known about how AMCs have implemented TriNetX to support clinical trials.

Findings: At our AMC over a six-year period, TriNetX integrated into existing institutional workflows enabled 402 requests for sponsor-initiated clinical trials, 14 % (n = 56) of which local investigators expressed interest in conducting.

View Article and Find Full Text PDF

Objective: Generation of automated clinical notes has been posited as a strategy to mitigate physician burnout. In particular, an automated narrative summary of a patient's hospital stay could supplement the hospital course section of the discharge summary that inpatient physicians document in electronic health record (EHR) systems. In the current study, we developed and evaluated an automated method for summarizing the hospital course section using encoder-decoder sequence-to-sequence transformer models.

View Article and Find Full Text PDF

Objectives: To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms.

Materials And Methods: We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation.

View Article and Find Full Text PDF

Objective: Obtaining electronic patient data, especially from electronic health record (EHR) systems, for clinical and translational research is difficult. Multiple research informatics systems exist but navigating the numerous applications can be challenging for scientists. This article describes Architecture for Research Computing in Health (ARCH), our institution's approach for matching investigators with tools and services for obtaining electronic patient data.

View Article and Find Full Text PDF

Introduction: Data extraction from electronic health record (EHR) systems occurs through manual abstraction, automated extraction, or a combination of both. While each method has its strengths and weaknesses, both are necessary for retrospective observational research as well as sudden clinical events, like the COVID-19 pandemic. Assessing the strengths, weaknesses, and potentials of these methods is important to continue to understand optimal approaches to extracting clinical data.

View Article and Find Full Text PDF

Purpose: Typically stored as unstructured notes, surgical pathology reports contain data elements valuable to cancer research that require labor-intensive manual extraction. Although studies have described natural language processing (NLP) of surgical pathology reports to automate information extraction, efforts have focused on specific cancer subtypes rather than across multiple oncologic domains. To address this gap, we developed and evaluated an NLP method to extract tumor staging and diagnosis information across multiple cancer subtypes.

View Article and Find Full Text PDF

Individuals infected with SARS-CoV-2 who also display hyperglycemia suffer from longer hospital stays, higher risk of developing acute respiratory distress syndrome (ARDS), and increased mortality. Nevertheless, the pathophysiological mechanism of hyperglycemia in COVID-19 remains poorly characterized. Here, we show that hyperglycemia is similarly prevalent among patients with ARDS independent of COVID-19 status.

View Article and Find Full Text PDF

Patients treated in an intensive care unit (ICU) are critically ill and require life-sustaining organ failure support. Existing critical care data resources are limited to a select number of institutions, contain only ICU data, and do not enable the study of local changes in care patterns. To address these limitations, we developed the Critical carE Database for Advanced Research (CEDAR), a method for automating extraction and transformation of data from an electronic health record (EHR) system.

View Article and Find Full Text PDF

COVID-19 has proven to be a metabolic disease resulting in adverse outcomes in individuals with diabetes or obesity. Patients infected with SARS-CoV-2 and hyperglycemia suffer from longer hospital stays, higher risk of developing acute respiratory distress syndrome (ARDS), and increased mortality compared to those who do not develop hyperglycemia. Nevertheless, the pathophysiological mechanism(s) of hyperglycemia in COVID-19 remains poorly characterized.

View Article and Find Full Text PDF

In less than nine months, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) killed over a million people, including >25,000 in New York City (NYC) alone. The COVID-19 pandemic caused by SARS-CoV-2 highlights clinical needs to detect infection, track strain evolution, and identify biomarkers of disease course. To address these challenges, we designed a fast (30-minute) colorimetric test (LAMP) for SARS-CoV-2 infection from naso/oropharyngeal swabs and a large-scale shotgun metatranscriptomics platform (total-RNA-seq) for host, viral, and microbial profiling.

View Article and Find Full Text PDF

Background: Although federal regulations mandate documentation of structured race data according to Office of Management and Budget (OMB) categories in electronic health record (EHR) systems, many institutions have reported gaps in EHR race data that hinder secondary use for population-level research focused on underserved populations. When evaluating race data available for research purposes, we found our institution's enterprise EHR contained structured race data for only 51% (1.6 million) of patients.

View Article and Find Full Text PDF

Introduction: Electronic health record (EHR)-driven phenotyping is a critical first step in generating biomedical knowledge from EHR data. Despite recent progress, current phenotyping approaches are manual, time-consuming, error-prone, and platform-specific. This results in duplication of effort and highly variable results across systems and institutions, and is not scalable or portable.

View Article and Find Full Text PDF

Developed to enable basic queries for cohort discovery, i2b2 has evolved to support complex queries. Little is known whether query sophistication - and the informatics resources required to support it - addresses researcher needs. In three years at our institution, 609 researchers ran 6,662 queries and requested re-identification of 80 patient cohorts to support specific studies.

View Article and Find Full Text PDF

Research to support precision medicine for leukemia patients requires integration of biospecimen and clinical data. The Observational Medical Outcomes Partnership common data model (OMOP CDM) and its Specimen table presents a potential solution. Although researchers have described progress and challenges in mapping electronic health record (EHR) data to populate the OMOP CDM, to our knowledge no studies have described populating the OMOP CDM with biospecimen data.

View Article and Find Full Text PDF

Objective: We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data.

Materials And Methods: Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data.

View Article and Find Full Text PDF

Healthcare provider organizations (HPOs) increasingly participate in large-scale research efforts sponsored by external organizations that require use of consent management systems that may not integrate seamlessly with local workflows. The resulting inefficiency can hinder the ability of HPOs to participate in studies. To overcome this challenge, we developed a method using REDCap, a widely adopted electronic data capture system, and novel middleware that can potentially generalize to other settings.

View Article and Find Full Text PDF

The NIH All of Us Research Program, a national effort to collect biospecimens and health data for over one million participants from across the United States, requires participating healthcare provider organizations (HPOs) to use informatics tools maintained by the NIH to manage participant consent, biospecimen processing, physical measurements, and other workflows. HPOs also maintain distinct workflows for handling overlapping tasks within their individual aegis, which do not necessarily achieve seamless interoperability with NIH-maintained cloud-based systems. At our HPO, we implemented informatics to address gaps in enrollment workflows and hardware, clinical workflow integration, patient engagement, laboratory support, and study team reporting.

View Article and Find Full Text PDF

Adoption of electronic informed consent (eConsent) for research remains low despite evidence of improved patient comprehension, usability, and workflow processes compared to paper. At our institution, we implemented an eConsent workflow using REDCap, a widely used electronic data capture system. The goal of this study was to evaluate the extent to which the REDCap eConsent solution adhered to federal guidance for eConsent.

View Article and Find Full Text PDF

The Patient Health Questionnaire-9 (PHQ-9) is a validated instrument for assessing depression severity. While some electronic health record (EHR) systems capture PHQ-9 scores in a structured format, unstructured clinical notes remain the only source in many settings, which presents data retrieval challenges for research and clinical decision support. To address this gap, we extended the open-source Leo natural language processing (NLP) platform to extract PHQ-9 scores from clinical notes and evaluated performance using EHR data for n=123,703 patients who were prescribed antidepressants.

View Article and Find Full Text PDF

Although i2b2, a popular platform for patient cohort discovery using electronic health record (EHR) data, can support multiple projects specific to individual disease areas or research interests, the standard approach for doing so duplicates data across projects, requiring additional disk space and processing time, which limits scalability. To address this deficiency, we developed a novel approach that stored data in a single i2b2 fact table and used structured query language (SQL) views to access data for specific projects. Compared to the standard approach, the view-based approach reduced required disk space by 59% and extract-transfer-load (ETL) time by 46%, without substantially impacting query performance.

View Article and Find Full Text PDF