Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Guergana K Savova James J Masanz Philip V Ogren Jiaping Zheng Sunghwan Sohn Karin C Kipper-Schuler Christopher G Chute

J Am Med Inform Assoc

Mayo Clinic College of Medicine, Rochester, Minnesota, USA.

Published: November 2010

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies-the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2995668	PMC
http://dx.doi.org/10.1136/jamia.2009.001560	DOI Listing

Publication Analysis

Top Keywords

clinical text

text analysis

analysis knowledge

knowledge extraction

extraction system

system ctakes

natural language

language processing

clinical free-text

overlapping spans

Similar Publications

Clinical significance criteria in the ICSD and DSM sleep disorder classifications: a content overlap analysis using the Jaccard index.

J Clin Sleep Med

January 2025

Univ. Bordeaux, CNRS, SANPSY, UMR 6033, F-33000 Bordeaux, France.

Christophe Gauld Vincent P Martin Clélia Quilès Pierre-Alexis Geoffroy Julien Coelho

Study Objectives: Both the (ICSD) and the sleep-wake disorders section of the (DSM) emphasize the importance of clinical judgment in distinguishing the normal from the pathological in sleep medicine. The fourth edition of the DSM (DSM-IV, 1994) introduced the clinical significance criterion (CSC) to standardize this judgment and enhance diagnostic reliability.

Methods: This review conducts a theoretical and historical content analysis of CSC presence, frequency, and formulation in the diagnostic criteria of sleep disorders.

View Article and Find Full Text PDF

Similar Publications

Prenatal metal(loid) exposure and preterm birth: a systematic review of the epidemiologic evidence.

J Expo Sci Environ Epidemiol

January 2025

Department of Environmental Sciences & Engineering, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.

Lauren A Eaves Evans K Lodge Wendy R Rohin Kyle R Roell Tracy A Manuck

Background: Preterm birth (PTB) is a common pregnancy complication associated with significant neonatal morbidity. Prenatal exposure to environmental chemicals, including toxic and/or essential metal(loid)s, may contribute to PTB risk.

Objective: We aimed to summarize the epidemiologic evidence of the associations among levels of arsenic (As), cadmium (Cd), chromium (Cr), copper (Cu), mercury (Hg), manganese (Mn), lead (Pb), and zinc (Zn) assessed during the prenatal period and PTB or gestational age at delivery; to assess the quality of the literature and strength of evidence for an effect for each metal; and to provide recommendations for future research.

View Article and Find Full Text PDF

Similar Publications

Adherence to tuberculosis (TB) treatment in high compared to low TB burden countries: study protocol for a systematic review and meta-analysis with a qualitative meta-synthesis of themes.

BMJ Open

January 2025

Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canada.

Oghenowede Eyawo Lynnette Nathalie Lyzwinski Uchechukwu Chidiebere Ugoji Shenyi Pan Setor Kofi Sorkpor

Introduction: Non-adherence to tuberculosis (TB) treatment poses a significant challenge to effective TB management globally and is a major contributor to the emergence of multidrug-resistant TB. Although adherence to TB treatment has been widely studied, a comprehensive evaluation of the comparative levels of adherence in high- versus low-TB burden settings remains lacking. The objective of this systematic review and meta-analysis is to assess the levels of adherence to TB treatment in high-TB burden countries compared to low-burden countries.

View Article and Find Full Text PDF

Similar Publications

Discontinuous named entities in clinical Text: A systematic literature review.

J Biomed Inform

January 2025

University of Manchester, United Kingdom.

Areej Alhassan Viktor Schlegel Monira Aloud Riza Batista-Navarro Goran Nenadic

Objective: Extracting named entities from clinical free-text presents unique challenges, particularly when dealing with discontinuous entities-mentions that are separated by unrelated words. Traditional NER methods often struggle to accurately identify these entities, prompting the development of specialised computational solutions. This paper systematically reviews and presents the methodologies developed for Discontinuous Named Entity Recognition in clinical texts, highlighting their effectiveness and the challenges they face.

View Article and Find Full Text PDF

Similar Publications

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.

J Biomed Inform

January 2025

Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, 02115, MA, USA; VA Boston Healthcare System, 150 S Huntington Ave, Boston, 02130, MA, USA. Electronic address:

Ziming Gan Doudou Zhou Everett Rush Vidul A Panickan Yuk-Lam Ho

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!