AI Article Synopsis

  • The text discusses the importance of understanding protein interactions in biological processes and highlights the gap between curated databases and existing literature.
  • To address this issue, the authors introduce ComplexTome, a manually annotated corpus that contains 1287 documents and around 3500 identified protein interactions, aiming to improve text-mining methods for identifying complex formation relationships.
  • They also present a novel extraction model with a high accuracy (F1-score of 82.8%) and provide access to the corpus, code, and results through Zenodo, GitHub, and the STRING database for further research.

Article Abstract

Motivation: Understanding biological processes relies heavily on curated knowledge of physical interactions between proteins. Yet, a notable gap remains between the information stored in databases of curated knowledge and the plethora of interactions documented in the scientific literature.

Results: To bridge this gap, we introduce ComplexTome, a manually annotated corpus designed to facilitate the development of text-mining methods for the extraction of complex formation relationships among biomedical entities targeting the downstream semantics of the physical interaction subnetwork of the STRING database. This corpus comprises 1287 documents with ∼3500 relationships. We train a novel relation extraction model on this corpus and find that it can highly reliably identify physical protein interactions (F1-score = 82.8%). We additionally enhance the model's capabilities through unsupervised trigger word detection and apply it to extract relations and trigger words for these relations from all open publications in the domain literature. This information has been fully integrated into the latest version of the STRING database.

Availability And Implementation: We provide the corpus, code, and all results produced by the large-scale runs of our systems biomedical on literature via Zenodo https://doi.org/10.5281/zenodo.8139716, Github https://github.com/farmeh/ComplexTome_extraction, and the latest version of STRING database https://string-db.org/.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11441320PMC
http://dx.doi.org/10.1093/bioinformatics/btae552DOI Listing

Publication Analysis

Top Keywords

physical protein
8
protein interactions
8
biomedical literature
8
curated knowledge
8
string database
8
latest version
8
version string
8
corpus
5
string-ing protein
4
protein complexes
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!