Pathology report data extraction from relational database using R, with extraction from reports on melanoma of skin as an example.

J Pathol Inform

Dahl-Chase Pathology Associates, Bangor, Maine, USA.

Published: October 2016

Background: Different methods have been described for data extraction from pathology reports with varying degrees of success. Here a technique for directly extracting data from relational database is described.

Methods: Our department uses synoptic reports modified from College of American Pathologists (CAP) Cancer Protocol Templates to report most of our cancer diagnoses. Choosing the melanoma of skin synoptic report as an example, R scripting language extended with RODBC package was used to query the pathology information system database. Reports containing melanoma of skin synoptic report in the past 4 and a half years were retrieved and individual data elements were extracted. Using the retrieved list of the cases, the database was queried a second time to retrieve/extract the lymph node staging information in the subsequent reports from the same patients.

Results: 426 synoptic reports corresponding to unique lesions of melanoma of skin were retrieved, and data elements of interest were extracted into an R data frame. The distribution of Breslow depth of melanomas grouped by year is used as an example of intra-report data extraction and analysis. When the new pN staging information was present in the subsequent reports, 82% (77/94) was precisely retrieved (pN0, pN1, pN2 and pN3). Additional 15% (14/94) was retrieved with certain ambiguity (positive or knowing there was an update). The specificity was 100% for both. The relationship between Breslow depth and lymph node status was graphed as an example of lesion-specific multi-report data extraction and analysis.

Conclusions: R extended with RODBC package is a simple and versatile approach well-suited for the above tasks. The success or failure of the retrieval and extraction depended largely on whether the reports were formatted and whether the contents of the elements were consistently phrased. This approach can be easily modified and adopted for other pathology information systems that use relational database for data management.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5100200PMC
http://dx.doi.org/10.4103/2153-3539.192822DOI Listing

Publication Analysis

Top Keywords

data extraction
16
melanoma skin
16
relational database
12
data
9
reports
8
reports melanoma
8
synoptic reports
8
skin synoptic
8
synoptic report
8
extended rodbc
8

Similar Publications

Double bond (C═C) position isomerism in unsaturated lipids can indicate abnormal lipid metabolism and pathological conditions. Novel chemical derivatization and mass spectrometry-based techniques are under continuing development to provide more accurate elucidation of lipid structure in finer structural detail. Here, we introduce a new ion chemistry for annotating lipid C═C positions, which is highly efficient for liquid chromatography-mass spectrometry-based lipidomics.

View Article and Find Full Text PDF

Objective: Asthma poses a significant health burden in South Asia, with increasing incidence and mortality despite a global decline in age-standardized prevalence rates. This study aims to analyze asthma trends from 1990 to 2021, focusing on prevalence, incidence, mortality, and disability-adjusted life years (DALYs) across South Asia. The study also assesses the impact of risk factors like high body mass index (BMI), smoking, and occupational exposures on asthma outcomes.

View Article and Find Full Text PDF

Ellagitannins from Pomegranate Flower with Whitening and Anti-skin Photoaging Effect.

Chem Biodivers

January 2025

Yatsen Global Innovation R&D Center, Yatsen Global Innovation R&D Center, No. 11 Building, No. 210, Wenshui Road, Jingan District, Shanghai, CHINA.

A new depside glucoside rosarugoside E (1), together with four known compounds punicalagin (2), corilagin (3), granatin B (4) and ellagic acid (5) were isolated from the ethanol extract of pomegranate (Punica granatum L.) flower. Their structures were identified based on careful analysis of various spectral data including UV, IR, HR-ESI-MS, 1D and 2D NMR.

View Article and Find Full Text PDF

Background: Malaria is one of the leading causes of morbidity and/or mortality in tropical Africa. The spread and development of resistance to chemical antimalarial drugs and the relatively high cost of the latter are problems associated with malaria control and are reasons to promote the use of plants to meet healthcare needs to treat malaria. The aim of this study was to evaluate antiplasmodial activities of extracts of (Mah quat), which is traditionally used for the treatment of malaria in the western region of Cameroon.

View Article and Find Full Text PDF

Background And Objective: Scabies is the second most common cause of disability due to skin disease in the Philippines. However, there were no cited studies in Global Burden of Disease 2019 and the disability-adjusted life years (DALY) computations were most likely based on statistical modelling. The Philippine Department of Health has embarked on a program to estimate the disease burden of priority diseases in the country, which include scabies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!