Computer-assisted curation of a human regulatory core network from the biological literature.

Bioinformatics

Humboldt-Universität zu Berlin, Institute for Computer Science, Knowledge Management in Bioinformatics, 10099 Berlin, Germany, Institute of Pathology, Charité-Universitätsmedizin Berlin, Deutsches Rheuma Forschungszentrum, Charitéplatz 1, 10117 Berlin, Germany, Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, 1117 Budapest, Hungary and Integrative Research Institute for the Life Sciences, Humboldt Universität zu Berlin, Philippstr. 13 Haus 18, 10115 Berlin, Germany.

Published: April 2015

AI Article Synopsis

  • Scientists have created a method to find new information about how certain proteins (called transcription factors) control human genes by looking through many scientific articles.
  • They discovered over 45,000 sentences that may describe these relationships, and by checking them, they found more than 300 unique interactions not listed before.
  • This new information improves our understanding of human genetics, especially in identifying genes linked to diseases, and is available for anyone to use online.

Article Abstract

Motivation: A highly interlinked network of transcription factors (TFs) orchestrates the context-dependent expression of human genes. ChIP-chip experiments that interrogate the binding of particular TFs to genomic regions are used to reconstruct gene regulatory networks at genome-scale, but are plagued by high false-positive rates. Meanwhile, a large body of knowledge on high-quality regulatory interactions remains largely unexplored, as it is available only in natural language descriptions scattered over millions of scientific publications. Such data are hard to extract and regulatory data currently contain together only 503 regulatory relations between human TFs.

Results: We developed a text-mining-assisted workflow to systematically extract knowledge about regulatory interactions between human TFs from the biological literature. We applied this workflow to the entire Medline, which helped us to identify more than 45 000 sentences potentially describing such relationships. We ranked these sentences by a machine-learning approach. The top-2500 sentences contained ∼900 sentences that encompass relations already known in databases. By manually curating the remaining 1625 top-ranking sentences, we obtained more than 300 validated regulatory relationships that were not present in a regulatory database before. Full-text curation allowed us to obtain detailed information on the strength of experimental evidences supporting a relationship.

Conclusions: We were able to increase curated information about the human core transcriptional network by >60% compared with the current content of regulatory databases. We observed improved performance when using the network for disease gene prioritization compared with the state-of-the-art.

Availability And Implementation: Web-service is freely accessible at http://fastforward.sys-bio.net/.

Contact: leser@informatik.hu-berlin.de or nils.bluethgen@charite.de

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btu795DOI Listing

Publication Analysis

Top Keywords

regulatory
9
biological literature
8
regulatory interactions
8
human
5
sentences
5
computer-assisted curation
4
curation human
4
human regulatory
4
regulatory core
4
network
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!