Motivation: The development of deep, bidirectional transformers such as Bidirectional Encoder Representations from Transformers (BERT) led to an outperformance of several Natural Language Processing (NLP) benchmarks. Especially in radiology, large amounts of free-text data are generated in daily clinical workflow. These report texts could be of particular use for the generation of labels in machine learning, especially for image classification. However, as report texts are mostly unstructured, advanced NLP methods are needed to enable accurate text classification. While neural networks can be used for this purpose, they must first be trained on large amounts of manually labelled data to achieve good results. In contrast, BERT models can be pre-trained on unlabelled data and then only require fine tuning on a small amount of manually labelled data to achieve even better results.

Results: Using BERT to identify the most important findings in intensive care chest radiograph reports, we achieve areas under the receiver operation characteristics curve of 0.98 for congestion, 0.97 for effusion, 0.97 for consolidation and 0.99 for pneumothorax, surpassing the accuracy of previous approaches with comparatively little annotation effort. Our approach could therefore help to improve information extraction from free-text medical reports. Availability  and implementationWe make the source code for fine-tuning the BERT-models freely available at https://github.com/fast-raidiology/bert-for-radiology.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa668DOI Listing

Publication Analysis

Top Keywords

natural language
8
large amounts
8
report texts
8
manually labelled
8
labelled data
8
data achieve
8
data
5
highly accurate
4
accurate classification
4
classification chest
4

Similar Publications

Bibliometric analysis of global research trends in vestibular neuritis (1980-2024).

Eur Arch Otorhinolaryngol

January 2025

Faculty of Applied Sciences, Department of Accounting and Financial Management, Necmettin Erbakan University, Konya, Turkey.

Purpose: Vestibular neuritis (VN) is a common cause of vertigo with significant impact on patients' quality of life. This study aimed to analyze global research trends in VN using bibliometric methods to identify key themes, influential authors, institutions, and countries contributing to the field.

Methods: We conducted a comprehensive search of the Web of Science Core Collection database for publications related to VN from 1980 to 2024.

View Article and Find Full Text PDF

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision.

View Article and Find Full Text PDF

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.

J Biomed Inform

January 2025

Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, 02115, MA, USA; VA Boston Healthcare System, 150 S Huntington Ave, Boston, 02130, MA, USA. Electronic address:

Objective: Electronic health record (EHR) systems contain a wealth of clinical data stored as both codified data and free-text narrative notes (NLP). The complexity of EHR presents challenges in feature representation, information extraction, and uncertainty quantification. To address these challenges, we proposed an efficient Aggregated naRrative Codified Health (ARCH) records analysis to generate a large-scale knowledge graph (KG) for a comprehensive set of EHR codified and narrative features.

View Article and Find Full Text PDF

IdenHerb: A strategy for identifying constitutive herbs of herbal products by screening exclusive ions of each herb from large-scale multi-group LC-MS data.

J Chromatogr A

January 2025

Division of Pharmacognosy, School of Pharmaceutical Sciences, State Key Laboratory of Natural and Biomimetic Drugs, Peking University, 38 Xueyuan Road, Beijing 100191, China; Medical College, Tibet University, Lhasa 850002, China. Electronic address:

Identification of constitutive herbs in an herbal product is critical for ensuring its quality and efficacy. However, current identification methods often lack universality, entail long durations, and involve complex procedures. Therefore, there is an urgent need to develop innovative methods for identifying constitutive herbs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!