A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.

PLoS Comput Biol

Knowledge Management in Bioinformatics, Computer Science Department, Humboldt-Universität zu Berlin, Berlin, Germany.

Published: July 2010

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein-protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using cross-validation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2895635PMC
http://dx.doi.org/10.1371/journal.pcbi.1000837DOI Listing

Publication Analysis

Top Keywords

ppi extraction
12
methods
8
kernel methods
8
protein-protein interactions
8
convolution kernels
8
evaluation data
8
extraction
6
kernels
6
evaluation
6
comprehensive benchmark
4

Similar Publications

Essential oils from Amorpha fruticosa against hepatocellular carcinoma based on network pharmacology.

BMC Complement Med Ther

January 2025

Henan Institute of Medical and Pharmaceutical Sciences, Zhengzhou University, Zhengzhou, China.

Background: Amorpha fruticosa was used for treating burn, ambustion, carbuncle, and eczema in the traditional Chinese medicine. Although more and more attention has been paid to its biological activity recently, the antitumor activities of the essential oils (EOs) extracted from its leaves (AFLEO) and flowers (AFFEO), and their molecular mechanisms have never been reported up to now. The objective of present study was to examine the chemical compositions of AFLEO and AFFEO, then investigate the effects and pharmacological mechanism of EOs against hepatocellular carcinoma (HCC).

View Article and Find Full Text PDF

The relevance of endoplasmic reticulum lumen and Anoctamin-8 for major depression: Results from a systems biology study.

J Psychiatr Res

January 2025

Laboratory of Molecular Psychiatry. Rua Ramiro Barcelos, Centro de Pesquisa Experimental - Hospital de Clínicas de Porto Alegre (HCPA), Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, 2350, Brazil; Postgraduate Program of Psychiatry and Behavioral Sciences. Rua Ramiro Barcelos, Department of Psychiatry, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, 2400, Brazil.

Major depressive disorder (MDD) is a highly prevalent and debilitating disorder, yet its pathophysiology has not been fully elucidated. The aim of this study is to identify novel potential proteins and biological processes associated with MDD through a systems biology approach. Original articles involving the measurement of proteins in the blood of patients diagnosed with MDD were selected.

View Article and Find Full Text PDF

Ethnopharmacological Relevance: Phellinus igniarius (Linnearus: Fries) Quelet (Phellinus igniarius) is an edible and medicinal fungi and has been used in China for centuries. It is found to improve organs function and metabolic homeostasis including ameliorating hyperuricemia (HUA). Polysaccharide is a predominant component in P.

View Article and Find Full Text PDF

Background: Uterine Corpus Endometrial Carcinoma (UCEC) is a prevalent gynecologic malignancy with complex molecular underpinnings. This study identifies key woundhealing genes involved in UCEC and elucidates their roles through a comprehensive analysis.

Methods: In silico and in vitro experiments.

View Article and Find Full Text PDF

Thyroid cancer (TC) is the most common endocrine malignancy, with a rapidly increasing global incidence. Scutellariae Barbatae Herba (SBH) exhibits significant antitumor activity; however, its mechanism against TC remains unclear. This study aims to explore the immunotherapeutic mechanism of SBH in treating TC through network pharmacology, bioinformatics analysis, and experimental validation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!