Scalable Text Mining Assisted Curation of Post-Translationally Modified Proteoforms in the Protein Ontology.

Karen E Ross Darren A Natale Cecilia Arighi Sheng-Chih Chen Hongzhan Huang Gang Li Jia Ren Michael Wang K Vijay-Shanker Cathy H Wu

CEUR Workshop Proc

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.

Published: August 2016

The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5504912	PMC

Publication Analysis

Top Keywords

text mining

post-translationally modified

protein ontology

pro

pro representation

ptm proteoforms

ptm proteoform

proteoform classes

proteoforms

ptm

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!