Background: We developed a United States-based real-world data resource to better understand the continued impact of the coronavirus disease 2019 (COVID-19) pandemic on immunocompromised patients, who are typically underrepresented in prospective studies and clinical trials.
Methods: The COVID-19 Real World Data infrastructure (CRWDi) was created by linking and harmonizing de-identified HealthVerity medical and pharmacy claims data from 1 December 2018 to 31 December 2023, with severe acute respiratory syndrome coronavirus 2 virologic and serologic laboratory data from major commercial laboratories and Northwell Health; COVID-19 vaccination data; and, for patients with cancer, 2010 to 2021 National Cancer Institute Surveillance, Epidemiology, and End Results registry data.
Results: The CRWDi contains 4 cohorts: patients with cancer; patients with rheumatic diseases receiving pharmacotherapy; noncancer solid organ and hematopoietic stem cell transplant recipients; and people from the general population including adults and pediatric patients.
Childhood cancers are a heterogeneous group of rare diseases, accounting for less than 2% of all cancers diagnosed worldwide. Most countries, therefore, do not have enough cases to provide robust information on epidemiology, treatment, and late effects, especially for rarer types of cancer. Thus, only through a concerted effort to share data internationally will we be able to answer research questions that could not otherwise be answered.
View Article and Find Full Text PDFJ Natl Cancer Inst Monogr
August 2024
J Natl Cancer Inst Monogr
August 2024
Although the Surveillance, Epidemiology, and End Results (SEER) Program has maintained high standards of quality and completeness, the traditional data captured through population-based cancer surveillance are no longer sufficient to understand the impact of cancer and its outcomes. Therefore, in recent years, the SEER Program has expanded the population it covers and enhanced the types of data that are being collected. Traditionally, surveillance systems collected data characterizing the patient and their cancer at the time of diagnosis, as well as limited information on the initial course of therapy.
View Article and Find Full Text PDFJ Natl Cancer Inst Monogr
August 2024
The National Cancer Institute and the Department of Energy strategic partnership applies advanced computing and predictive machine learning and deep learning models to automate the capture of information from unstructured clinical text for inclusion in cancer registries. Applications include extraction of key data elements from pathology reports, determination of whether a pathology or radiology report is related to cancer, extraction of relevant biomarker information, and identification of recurrence. With the growing complexity of cancer diagnosis and treatment, capturing essential information with purely manual methods is increasingly difficult.
View Article and Find Full Text PDFJ Natl Cancer Inst Monogr
August 2024
One of the challenges associated with understanding environmental impacts on cancer risk and outcomes is estimating potential exposures of individuals diagnosed with cancer to adverse environmental conditions over the life course. Historically, this has been partly due to the lack of reliable measures of cancer patients' potential environmental exposures before a cancer diagnosis. The emerging sources of cancer-related spatiotemporal environmental data and residential history information, coupled with novel technologies for data extraction and linkage, present an opportunity to integrate these data into the existing cancer surveillance data infrastructure, thereby facilitating more comprehensive assessment of cancer risk and outcomes.
View Article and Find Full Text PDFBackground: The National Cancer Institute funds many large cohort studies that rely on self-reported cancer data requiring medical record validation. This is labor intensive, costly, and prone to underreporting or misreporting of cancer and disparity-related differential response. US population-based central cancer registries identify incident cancer within their catchment area, yielding all malignant neoplasms and benign brain and central nervous system tumors with standardized data fields.
View Article and Find Full Text PDFBackground: The Surveillance, Epidemiology, and End Results (SEER) Program with the National Cancer Institute tested whether population-based cancer registries can serve as honest brokers to acquire tissue and data in the SEER-Linked Virtual Tissue Repository (VTR) Pilot.
Methods: We collected formalin-fixed, paraffin-embedded tissue and clinical data from patients with pancreatic ductal adenocarcinoma (PDAC) and breast cancer (BC) for two studies comparing cancer cases with highly unusual survival (≥5 years for PDAC and ≤30 months for BC) to pair-matched controls with usual survival (≤2 years for PDAC and ≥5 years for BC). Success was defined as the ability for registries to acquire tissue and data on cancer cases with highly unusual outcomes.
Purpose: This study assessed the prevalence of specific major adverse financial events (AFEs)-bankruptcies, liens, and evictions-before a cancer diagnosis and their association with later-stage cancer at diagnosis.
Methods: Patients age 20-69 years diagnosed with cancer during 2014-2015 were identified from the Seattle, Louisiana, and Georgia SEER population-based cancer registries. Registry data were linked with LexisNexis consumer data to identify patients with a history of court-documented AFEs before cancer diagnosis.
Cancer Epidemiol Biomarkers Prev
November 2023
Introduction: Health care procedures including cancer screening and diagnosis were interrupted due to the COVID-19 pandemic. The extent of this impact on cancer care in the United States is not fully understood. We investigated pathology report volume as a reflection of trends in oncology services pre-pandemic and during the pandemic.
View Article and Find Full Text PDFData-driven basic, translational, and clinical research has resulted in improved outcomes for children, adolescents, and young adults (AYAs) with pediatric cancers. However, challenges in sharing data between institutions, particularly in research, prevent addressing substantial unmet needs in children and AYA patients diagnosed with certain pediatric cancers. Systematically collecting and sharing data from every child and AYA can enable greater understanding of pediatric cancers, improve survivorship, and accelerate development of new and more effective therapies.
View Article and Find Full Text PDFThis retrospective observational study aimed to gain a better understanding of the protective duration of prior SARS-CoV-2 infection against reinfection. The objectives were two-fold: to assess the durability of immunity to SARS-CoV-2 reinfection among initially unvaccinated individuals with previous SARS-CoV-2 infection, and to evaluate the crude SARS-CoV-2 reinfection rate and associated risk factors. During the pandemic era time period from February 29, 2020, through April 30, 2021, 144,678,382 individuals with SARS-CoV-2 molecular diagnostic or antibody test results were studied.
View Article and Find Full Text PDFObjective: We aim to reduce overfitting and model overconfidence by distilling the knowledge of an ensemble of deep learning models into a single model for the classification of cancer pathology reports.
Materials And Methods: We consider the text classification problem that involves 5 individual tasks. The baseline model consists of a multitask convolutional neural network (MtCNN), and the implemented ensemble (teacher) consists of 1000 MtCNNs.
Follow-up of US cohort members for incident cancer is time-consuming, is costly, and often results in underascertainment when the traditional methods of self-reporting and/or medical record validation are used. We conducted one of the first large-scale investigations to assess the feasibility, methods, and benefits of linking participants in the US Radiologic Technologists (USRT) Study (n = 146,022) with the majority of US state or regional cancer registries. Follow-up of this cohort has relied primarily on questionnaires (mailed approximately every 10 years) and linkage with the National Death Index.
View Article and Find Full Text PDFObjectives: The International Classification of Childhood Cancer (ICCC) facilitates the effective classification of a heterogeneous group of cancers in the important pediatric population. However, there has been no development of machine learning models for the ICCC classification. We developed deep learning-based information extraction models from cancer pathology reports based on the ICD-O-3 coding standard.
View Article and Find Full Text PDFImportance: Better understanding of the protective duration of prior SARS-CoV-2 infection against reinfection is needed.
Objective: Primary: To assess the durability of immunity to SARS-CoV-2 reinfection among initially unvaccinated individuals with previous SARS-CoV-2 infection. Secondary: Evaluate the crude SARS-CoV-2 reinfection rate and associated characteristics.
IEEE J Biomed Health Inform
June 2022
Recent applications ofdeep learning have shown promising results for classifying unstructured text in the healthcare domain. However, the reliability of models in production settings has been hindered by imbalanced data sets in which a small subset of the classes dominate. In the absence of adequate training data, rare classes necessitate additional model constraints for robust performance.
View Article and Find Full Text PDFThe National Cancer Institute (NCI) Surveillance, Epidemiology, and End Results (SEER) program is continuously exploring opportunities to augment its already extensive collection of data, enhance the quality of reported cancer information, and contribute to more comprehensive analyses of cancer burden. This manuscript describes a recent linkage of the LexisNexis longitudinal residential history data with 11 SEER registries and provides estimates of the inter-state mobility of SEER cancer patients. To identify mobility from one state to another, we used state postal abbreviations to generate state-level residential histories.
View Article and Find Full Text PDFGenerating evidence on the use, effectiveness, and safety of new cancer therapies is a priority for researchers, health care providers, payers, and regulators given the rapid pace of change in cancer diagnosis and treatments. The use of real-world data (RWD) is integral to understanding the utilization patterns and outcomes of these new treatments among patients with cancer who are treated in clinical practice and community settings. An initial step in the use of RWD is careful study design to assess the suitability of an RWD source.
View Article and Find Full Text PDFIn the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention.
View Article and Find Full Text PDFCancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. This consortium has regularly held topic-focused biannual face-to-face symposiums. These meetings are a place to review cancer informatics and data science priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues that we faced at our respective institutions and cancer centers.
View Article and Find Full Text PDFIEEE Trans Emerg Top Comput
April 2020
Background: Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model.
View Article and Find Full Text PDF