Enhancing the coverage of SemRep using a relation classification approach.

J Biomed Inform

School of Information Sciences, University of Illinois Urbana-Champaign, 501 E Daniel St., Champaign, 61820, IL, USA. Electronic address:

Published: July 2024

Objective: Relation extraction is an essential task in the field of biomedical literature mining and offers significant benefits for various downstream applications, including database curation, drug repurposing, and literature-based discovery. The broad-coverage natural language processing (NLP) tool SemRep has established a solid baseline for extracting subject-predicate-object triples from biomedical text and has served as the backbone of the Semantic MEDLINE Database (SemMedDB), a PubMed-scale repository of semantic triples. While SemRep achieves reasonable precision (0.69), its recall is relatively low (0.42). In this study, we aimed to enhance SemRep using a relation classification approach, in order to eventually increase the size and the utility of SemMedDB.

Methods: We combined and extended existing SemRep evaluation datasets to generate training data. We leveraged the pre-trained PubMedBERT model, enhancing it through additional contrastive pre-training and fine-tuning. We experimented with three entity representations: mentions, semantic types, and semantic groups. We evaluated the model performance on a portion of the SemRep Gold Standard dataset and compared it to SemRep performance. We also assessed the effect of the model on a larger set of 12K randomly selected PubMed abstracts.

Results: Our results show that the best model yields a precision of 0.62, recall of 0.81, and F score of 0.70. Assessment on 12K abstracts shows that the model could double the size of SemMedDB, when applied to entire PubMed. We also manually assessed the quality of 506 triples predicted by the model that SemRep had not previously identified, and found that 67% of these triples were correct.

Conclusion: These findings underscore the promise of our model in achieving a more comprehensive coverage of relationships mentioned in biomedical literature, thereby showing its potential in enhancing various downstream applications of biomedical literature mining. Data and code related to this study are available at https://github.com/Michelle-Mings/SemRep_RelationClassification.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jbi.2024.104658DOI Listing

Publication Analysis

Top Keywords

biomedical literature
12
semrep
8
semrep relation
8
relation classification
8
classification approach
8
literature mining
8
downstream applications
8
model
7
enhancing coverage
4
coverage semrep
4

Similar Publications

The increasing prevalence of diabetes mellitus worldwide necessitates that medical undergraduates acquire a deep understanding of the disease to ensure accurate diagnosis and effective management. Traditional teaching methods, while foundational, often lack the interactive elements that enhance student engagement and knowledge retention. This study aimed to evaluate the effectiveness of a novel educational board game, "Diabe-teach," in enhancing knowledge retention among medical students compared with conventional self-study methods.

View Article and Find Full Text PDF

Purpose: Bone cement-reinforced fenestrated pedicle screws (FPSs) have been widely used in the internal fixation and repair of the spine with osteoporosis in recent years and show significant improvement in fixation strength and stability. However, compared with conventional reinforcement methods, the advantages of bone cement-reinforced FPSs remain undetermined. This article compares the effects of fenestrated and conventional pedicle screws (CPSs) combined with bone cement in the treatment of osteoporosis.

View Article and Find Full Text PDF

Purpose: Continuous EEG (cEEG) monitoring is increasingly used in the management of neonates with seizures. There remains debate on what clinically relevant information can be gained from cEEG in neonates with suspected seizures, at high risk for seizures, or with definite seizures, as well as the use of cEEG for prognosis in a variety of conditions. In this guideline, we address these questions using American Clinical Neurophysiology Society structured methodology for clinical guideline development.

View Article and Find Full Text PDF

Background: There is a paucity of research regarding COVID-19 vaccines administration errors (VAEs) during the COVID-19 pandemic. This study aimed to investigate the prevalence, types, severity, causes and predictors of VAEs in Jordan during the recent pandemic.

Method: This was a 3-day (Sunday, Tuesday and Thursday of the third week of November 2021) prospective, covert observational point prevalence study.

View Article and Find Full Text PDF

Background: Various explanations have been proposed for how hearing impairment might be associated with increased risk of dementia. Several theories have proposed direct links with Alzheimer's disease (AD) neuropathology, either due to shared aetiology (i.e.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!