Natural language processing (NLP) shows promise for automating the reading of radiology reports, but its adaptability and reliability for clinical use still need confirmation.
A study compared four NLP tools (EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) on their performance in identifying cerebrovascular conditions (ischaemic stroke, small vessel disease, and atrophy) using reports and imaging data from NHS Fife and Generation Scotland.
Results indicated that EdIE-R consistently achieved the highest F1 scores across most conditions, while ALARM+ showed strong performance, especially in precision, highlighting the varied effectiveness of these tools.
Overall, the findings suggest that while these NLP tools can classify medical conditions, their performance can differ significantly
- The systematic review examines the use of natural language processing (NLP) in analyzing radiology reports, emphasizing the need for transparent methodologies to enable comparisons and reproducibility across studies.
- It analyzed 164 studies published between January 2015 and October 2019, finding that most focused on disease classification (28%) and diagnostic surveillance (27.4%), primarily using English reports from various imaging modalities, with oncology being the most common disease area.
- The review highlights issues such as inadequate reporting on essential factors like dataset preparation and validation, with only a small percentage providing details on external validation and data/code availability, suggesting a need for improved reporting standards in NLP research.
NLP is crucial for extracting structured information from radiology reports, but comprehensive reviews on its application are lacking.
A systematic literature search identified 164 publications, revealing a significant increase in usage since 2015, with a shift towards deep learning despite ongoing challenges in clinical adoption and data availability.
Enhancing the reproducibility and explainability of NLP models is essential for their integration into clinical practice, and there is a need for improved sharing of data and methodologies to facilitate comparison across studies.
Advances in text mining technology allow for accurate extraction of structured information from unstructured Electronic Healthcare Records (EHR).
The Edinburgh Information Extraction for Radiology reports (EdIE-R) system classifies radiologists' reports to identify occurrences and types of stroke, using a dataset of 1168 reports from the Edinburgh Stroke Study.
High inter-annotator agreement and system accuracy on a blind test set (ranging from 92.61 to 98.27 for various annotations) suggest that EdIE-R can effectively contribute to population health monitoring and epidemiological research.
Manual coding of brain imaging phenotypes in radiology reports is inefficient, prompting the development of an NLP algorithm for automated identification in NHS reports.
The NLP algorithm was tested on anonymized reports from stroke/TIA patients, showing excellent agreement between expert readers and strong performance in detecting various brain conditions, including ischaemic strokes and brain tumors.
The successful implementation of this NLP approach enables large-scale identification of patients with significant brain imaging phenotypes, which enhances clinical practice within the NHS.
Scholarly articles are increasingly referencing dynamic web resources, such as project sites and blogs, to enhance context but this may mislead readers as the content can change over time.
A study revisiting articles published from 1997 to 2012 found that only about 30% of the referenced web resources had stable snapshots available to compare to current content.
Over 75% of these web resources showed significant content drift since the original reference, highlighting concerns about the reliability of web-based scholarly information.
The paper discusses two JISC-funded projects focused on enhancing the metadata of digitized historical collections through automatic georeferencing and information extraction.
Understanding location is crucial in historical research, making these collections a valuable test case for automated methodologies.
The projects, GeoDigRef and Embedding GeoCrossWalk, explored how automatic georeferencing can improve geographical searches and detailed the configurations and evaluations of the geoparser used.
The University of Edinburgh team developed a natural language processing (NLP) system to assist in curating biomedical research papers, aiming to make the curation process easier for biologists.
Their system excelled in interaction subtasks and performed well in gene mention tasks with minimal effort, while a string matching technique for gene normalization showed close to average results.
Although the technology is effective for individual tasks, complex tasks requiring multiple components, like detecting and normalizing interacting protein pairs, remain difficult for current NLP systems.
A maximum entropy-based system was developed for identifying named entities in biomedical abstracts.
The system achieved an F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation.
Key features of the system include the use of local features, attention to boundary identification, and the integration of external resources, along with a discussion of data annotation issues that affected performance.
Text mining has potential to enhance biomedical text curation, but there’s limited evidence of its actual effectiveness.
Three experiments were conducted to see how Natural Language Processing (NLP) can speed up the curation process for protein-protein interactions (PPIs) and to gather curator feedback on the usability of an NLP-integrated curation tool.
The findings suggest that if NLP output were perfectly accurate, curation time could be sped up by a maximum of one third, but more research is needed to validate curators' preferences for consistent and high-recall NLP outputs.
Good automatic information extraction tools can help process the growing amount of biomedical literature, with named entity recognition as a key feature.
A maximum-entropy based system was developed to identify gene and protein names in biomedical abstracts, utilizing a variety of features.
In evaluations, the system achieved high precision and recall, demonstrating effective identification of entity boundaries and innovative use of external knowledge sources.