Publications by Penberthy L | LitMetric

Publications by authors named "Penberthy L"

Page 1 of 6

Coronavirus Disease 2019 (COVID-19) Real World Data Infrastructure: A Big-Data Resource for Study of the Impact of COVID-19 in Patient Populations With Immunocompromising Conditions.

James M Crawford Lynne Penberthy Ligia A Pinto Keri N Althoff Magdalene M Assimon

Open Forum Infect Dis

January 2025

Background: We developed a United States-based real-world data resource to better understand the continued impact of the coronavirus disease 2019 (COVID-19) pandemic on immunocompromised patients, who are typically underrepresented in prospective studies and clinical trials.

Methods: The COVID-19 Real World Data infrastructure (CRWDi) was created by linking and harmonizing de-identified HealthVerity medical and pharmacy claims data from 1 December 2018 to 31 December 2023, with severe acute respiratory syndrome coronavirus 2 virologic and serologic laboratory data from major commercial laboratories and Northwell Health; COVID-19 vaccination data; and, for patients with cancer, 2010 to 2021 National Cancer Institute Surveillance, Epidemiology, and End Results registry data.

Results: The CRWDi contains 4 cohorts: patients with cancer; patients with rheumatic diseases receiving pharmacotherapy; noncancer solid organ and hematopoietic stem cell transplant recipients; and people from the general population including adults and pediatric patients.

View Article and Find Full Text PDF

Making the Case for an International Childhood Cancer Data Partnership.

Gonçalo Forjaz Betsy Kohler Michel P Coleman Eva Steliarova-Foucher Serban Negoita

J Natl Cancer Inst

January 2025

Childhood cancers are a heterogeneous group of rare diseases, accounting for less than 2% of all cancers diagnosed worldwide. Most countries, therefore, do not have enough cases to provide robust information on epidemiology, treatment, and late effects, especially for rarer types of cancer. Thus, only through a concerted effort to share data internationally will we be able to answer research questions that could not otherwise be answered.

View Article and Find Full Text PDF

Development of message passing-based graph convolutional networks for classifying cancer pathology reports.

Hong-Jun Yoon Hilda B Klasky Andrew E Blanchard J Blair Christian Eric B Durbin

BMC Med Inform Decis Mak

September 2024

Article Synopsis

The study discusses the limitations of using graph convolutional networks (GCN) for classifying natural language texts, particularly in terms of memory usage and distribution.
It introduces a new model called FastMPN, which features a message passing architecture that allows for adjustable node embeddings and edge weights, improving the GCN's problem-solving ability.
The FastMPN model was tested on extracting clinical data from cancer pathology reports, outperforming or matching existing models and training quickly on a large dataset using advanced hardware.

View Article and Find Full Text PDF

Reporting tumor genomic test results to SEER registries via linkages.

Valentina I Petkov Jung S Byun Kevin C Ward Nicola C Schussler Natalie P Archer

J Natl Cancer Inst Monogr

August 2024

Article Synopsis

Precision medicine is increasingly important in cancer care, but tumor genomic data has been lacking in the National Cancer Institute's SEER Program, limiting research on molecular subtypes.
To improve this, the SEER Program has implemented a centralized process to link cancer cases in their registries with genomic test results from molecular labs, using specialized software and a trusted third party for data handling.
Recent linkages have included various OncotypeDX tests and results from other genomic classifiers, which facilitate the research community's access to valuable, de-identified data for cancer studies.

View Article and Find Full Text PDF

The SEER Program's evolution: supporting clinically meaningful population-level research.

Lynne Penberthy Steven Friedman

J Natl Cancer Inst Monogr

August 2024

Although the Surveillance, Epidemiology, and End Results (SEER) Program has maintained high standards of quality and completeness, the traditional data captured through population-based cancer surveillance are no longer sufficient to understand the impact of cancer and its outcomes. Therefore, in recent years, the SEER Program has expanded the population it covers and enhanced the types of data that are being collected. Traditionally, surveillance systems collected data characterizing the patient and their cancer at the time of diagnosis, as well as limited information on the initial course of therapy.

View Article and Find Full Text PDF

Machine learning and deep learning tools for the automated capture of cancer surveillance data.

Elizabeth Hsu Heidi Hanson Linda Coyle Jennifer Stevens Georgia Tourassi

J Natl Cancer Inst Monogr

August 2024

The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult.

View Article and Find Full Text PDF

Landscape analysis of environmental data sources for linkage with SEER cancer patients database.

Zaria Tatalovich Amina Chtourou Li Zhu Curt Dellavalle Heidi A Hanson

J Natl Cancer Inst Monogr

August 2024

One of the challenges associated with understanding environmental impacts on cancer risk and outcomes is estimating potential exposures of individuals diagnosed with cancer to adverse environmental conditions over the life course. Historically, this has been partly due to the lack of reliable measures of cancer patients' potential environmental exposures before a cancer diagnosis. The emerging sources of cancer-related spatiotemporal environmental data and residential history information, coupled with novel technologies for data extraction and linkage, present an opportunity to integrate these data into the existing cancer surveillance data infrastructure, thereby facilitating more comprehensive assessment of cancer risk and outcomes.

View Article and Find Full Text PDF

Virtual Pooled Registry-Cancer Linkage System: an improved method for ascertaining cancer diagnoses.

Dennis Deapen Castine Clerkin William Howe Don Green Christopher J Johnson

J Natl Cancer Inst Monogr

August 2024

Background: The National Cancer Institute funds many large cohort studies that rely on self-reported cancer data requiring medical record validation. This is labor intensive, costly, and prone to underreporting or misreporting of cancer and disparity-related differential response. US population-based central cancer registries identify incident cancer within their catchment area, yielding all malignant neoplasms and benign brain and central nervous system tumors with standardized data fields.

View Article and Find Full Text PDF

NCI SEER-Linked Virtual Tissue Repository Pilot.

Pamela Sanchez Alison L Van Dyke Valentina I Petkov Yao Yuan Sarah Bonds

J Natl Cancer Inst Monogr

August 2024

Background: The Surveillance, Epidemiology, and End Results (SEER) Program with the National Cancer Institute tested whether population-based cancer registries can serve as honest brokers to acquire tissue and data in the SEER-Linked Virtual Tissue Repository (VTR) Pilot.

Methods: We collected formalin-fixed, paraffin-embedded tissue and clinical data from patients with pancreatic ductal adenocarcinoma (PDAC) and breast cancer (BC) for two studies comparing cancer cases with highly unusual survival (≥5 years for PDAC and ≤30 months for BC) to pair-matched controls with usual survival (≤2 years for PDAC and ≥5 years for BC). Success was defined as the ability for registries to acquire tissue and data on cancer cases with highly unusual outcomes.

View Article and Find Full Text PDF

Association of Major Adverse Financial Events and Later-Stage Cancer Diagnosis in the United States.

Joan L Warren Angela B Mariotto Jennifer Stevens Amy J Davidoff Veena Shankaran

J Clin Oncol

March 2024

Purpose: This study assessed the prevalence of specific major adverse financial events (AFEs)-bankruptcies, liens, and evictions-before a cancer diagnosis and their association with later-stage cancer at diagnosis.

Methods: Patients age 20-69 years diagnosed with cancer during 2014-2015 were identified from the Seattle, Louisiana, and Georgia SEER population-based cancer registries. Registry data were linked with LexisNexis consumer data to identify patients with a history of court-documented AFEs before cancer diagnosis.

View Article and Find Full Text PDF

Deep learning uncertainty quantification for clinical text classification.

Alina Peluso Ioana Danciu Hong-Jun Yoon Jamaludin Mohd Yusof Tanmoy Bhattacharya

J Biomed Inform

January 2024

Article Synopsis

Machine learning models, specifically deep neural networks (DNNs), are increasingly used in decision-making alongside humans, emphasizing the need for reliable classifications.
This paper highlights the use of DNNs to automate the extraction of cancer-related data from electronic pathology reports, while introducing new selective classification methods to improve accuracy and reduce the number of unreliable predictions.
The proposed methods outperform existing models by achieving high accuracy with lower rejection rates, demonstrating their effectiveness in processing complex medical data.

View Article and Find Full Text PDF

Impact on the Volume of Pathology Reports Before and During the COVID-19 Pandemic in SEER Cancer Registries.

Amina Chtourou Pamela V Sanchez Todd Golden Huann-Sheng Chen Stephen M Schwartz

Cancer Epidemiol Biomarkers Prev

November 2023

Introduction: Health care procedures including cancer screening and diagnosis were interrupted due to the COVID-19 pandemic. The extent of this impact on cancer care in the United States is not fully understood. We investigated pathology report volume as a reflection of trends in oncology services pre-pandemic and during the pandemic.

View Article and Find Full Text PDF

The Childhood Cancer Data Initiative: Using the Power of Data to Learn From and Improve Outcomes for Every Child and Young Adult With Pediatric Cancer.

Joseph A Flores-Toro Subhashini Jagu Gregory T Armstrong David F Arons Gregory J Aune

J Clin Oncol

August 2023

Data-driven basic, translational, and clinical research has resulted in improved outcomes for children, adolescents, and young adults (AYAs) with pediatric cancers. However, challenges in sharing data between institutions, particularly in research, prevent addressing substantial unmet needs in children and AYA patients diagnosed with certain pediatric cancers. Systematically collecting and sharing data from every child and AYA can enable greater understanding of pediatric cancers, improve survivorship, and accelerate development of new and more effective therapies.

View Article and Find Full Text PDF

Risk of and duration of protection from SARS-CoV-2 reinfection assessed with real-world data.

Shannon L Reynolds Harvey W Kaufman William A Meyer Chris Bush Oren Cohen

PLoS One

March 2023

This retrospective observational study aimed to gain a better understanding of the protective duration of prior SARS-CoV-2 infection against reinfection. The objectives were two-fold: to assess the durability of immunity to SARS-CoV-2 reinfection among initially unvaccinated individuals with previous SARS-CoV-2 infection, and to evaluate the crude SARS-CoV-2 reinfection rate and associated risk factors. During the pandemic era time period from February 29, 2020, through April 30, 2021, 144,678,382 individuals with SARS-CoV-2 molecular diagnostic or antibody test results were studied.

View Article and Find Full Text PDF

Using ensembles and distillation to optimize the deployment of deep learning models for the classification of electronic cancer pathology reports.

Kevin De Angeli Shang Gao Andrew Blanchard Eric B Durbin Xiao-Cheng Wu

JAMIA Open

October 2022

Objective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports.

Materials And Methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs.

View Article and Find Full Text PDF

Ascertainment of Incident Cancer by US Population-Based Cancer Registries Versus Self-Reports and Death Certificates in a Nationwide Cohort Study, the US Radiologic Technologists Study.

Danping Liu Martha S Linet Paul S Albert Annelie M Landgren Cari M Kitahara

Am J Epidemiol

November 2022

Follow-up of US cohort members for incident cancer is time-consuming, is costly, and often results in underascertainment when the traditional methods of self-reporting and/or medical record validation are used. We conducted one of the first large-scale investigations to assess the feasibility, methods, and benefits of linking participants in the US Radiologic Technologists (USRT) Study (n = 146,022) with the majority of US state or regional cancer registries. Follow-up of this cohort has relied primarily on questionnaires (mailed approximately every 10 years) and linkage with the National Death Index.

View Article and Find Full Text PDF

Automatic information extraction from childhood cancer pathology reports.

Hong-Jun Yoon Alina Peluso Eric B Durbin Xiao-Cheng Wu Antoinette Stroup

JAMIA Open

July 2022

Objectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard.

View Article and Find Full Text PDF

Duration of Protection Against SARS-CoV-2 Reinfection and Associated Risk of Reinfection Assessed with Real-World Data.

Shannon L Reynolds Harvey W Kaufman William A Meyer Chris Bush Oren Cohen

medRxiv

February 2022

Importance: Better understanding of the protective duration of prior SARS-CoV-2 infection against reinfection is needed.

Objective: Primary: To assess the durability of immunity to SARS-CoV-2 reinfection among initially unvaccinated individuals with previous SARS-CoV-2 infection. Secondary: Evaluate the crude SARS-CoV-2 reinfection rate and associated characteristics.

View Article and Find Full Text PDF

A Keyword-Enhanced Approach to Handle Class Imbalance in Clinical Text Classification.

Andrew E Blanchard Shang Gao Hong-Jun Yoon J Blair Christian Eric B Durbin

IEEE J Biomed Health Inform

June 2022

Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance.

View Article and Find Full Text PDF

Assessment of Interstate Residential Mobility of SEER Patients: SEER and LexisNexis Residential Address Linkage.

Zaria Tatalovich David G Stinchcomb Angela Mariotto Diane Ng Jennifer L Stevens

J Registry Manag

June 2023

The National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) program is continuously exploring opportunities to augment its already extensive collection of data, enhance the quality of reported cancer information, and contribute to more comprehensive analyses of cancer burden. This manuscript describes a recent linkage of the LexisNexis longitudinal residential history data with 11 SEER registries and provides estimates of the inter-state mobility of SEER cancer patients. To identify mobility from one state to another, we used state postal abbreviations to generate state-level residential histories.

View Article and Find Full Text PDF

An overview of real-world data sources for oncology and considerations for research.

Lynne T Penberthy Donna R Rivera Jennifer L Lund Melissa A Bruno Anne-Marie Meyer

CA Cancer J Clin

May 2022

Generating evidence on the use, effectiveness, and safety of new cancer therapies is a priority for researchers, health care providers, payers, and regulators given the rapid pace of change in cancer diagnosis and treatments. The use of real-world data (RWD) is integral to understanding the utilization patterns and outcomes of these new treatments among patients with cancer who are treated in clinical practice and community settings. An initial step in the use of RWD is careful study design to assess the suitability of an RWD source.

View Article and Find Full Text PDF

Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types.

Kevin De Angeli Shang Gao Ioana Danciu Eric B Durbin Xiao-Cheng Wu

J Biomed Inform

January 2022

In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention.

View Article and Find Full Text PDF

Cancer Informatics for Cancer Centers: Scientific Drivers for Informatics, Data Science, and Care in Pediatric, Adolescent, and Young Adult Cancer.

Anthony R Kerlavage Anne C Kirchhoff Jaime M Guidry Auvil Norman E Sharpless Kara L Davis

JCO Clin Cancer Inform

August 2021

Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. This consortium has regularly held topic-focused biannual face-to-face symposiums. These meetings are a place to review cancer informatics and data science priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues that we faced at our respective institutions and cancer centers.

View Article and Find Full Text PDF

Privacy-Preserving Deep Learning NLP Models for Cancer Registries.

Mohammed Alawad Hong-Jun Yoon Shang Gao Brent Mumphrey Xiao-Cheng Wu

IEEE Trans Emerg Top Comput

April 2020

Article Synopsis

Population cancer registries can enhance their efficiency in extracting cancer characteristics from pathology reports by utilizing Deep Learning (DL), but challenges exist due to privacy issues regarding data sharing.
The proposed solution involves privacy-preserving transfer learning strategies to distribute a multitask convolutional neural network (MT-CNN) model among cancer registries without sharing sensitive patient data.
Results indicate that these privacy-preserving methods result in comparable performance to traditional centralized models, showing the effectiveness of collaboration in cancer data processing while maintaining confidentiality.

View Article and Find Full Text PDF

Deep active learning for classifying cancer pathology reports.

Kevin De Angeli Shang Gao Mohammed Alawad Hong-Jun Yoon Noah Schaefferkoetter

BMC Bioinformatics

March 2021

Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model.

View Article and Find Full Text PDF