Structured Correspondence Topic Models for Mining Captioned Figures in Biological Literature.

KDD

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 ; Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213.

Published: January 2009

A major source of information (often the most crucial and informative part) in scholarly articles from scientific journals, proceedings and books are the figures that directly provide images and other graphical illustrations of key experimental results and other scientific contents. In biological articles, a typical figure often comprises multiple panels, accompanied by either scoped or global captioned text. Moreover, the text in the caption contains important semantic entities such as protein names, gene ontology, tissues labels, etc., relevant to the images in the figure. Due to the avalanche of biological literature in recent years, and increasing popularity of various bio-imaging techniques, automatic retrieval and summarization of biological information from literature figures has emerged as a major unsolved challenge in computational knowledge extraction and management in the life science. We present a new structured probabilistic topic model built on a realistic figure generation scheme to model the structurally annotated biological figures, and we derive an efficient inference algorithm based on collapsed Gibbs sampling for information retrieval and visualization. The resulting program constitutes one of the key IR engines in our SLIF system that has recently entered the final round (4 out 70 competing systems) of the Elsevier Grand Challenge on Knowledge Enhancement in the Life Science. Here we present various evaluations on a number of data mining tasks to illustrate our method.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256960PMC
http://dx.doi.org/10.1145/1557019.1557031DOI Listing

Publication Analysis

Top Keywords

biological literature
12
life science
8
biological
5
structured correspondence
4
correspondence topic
4
topic models
4
models mining
4
mining captioned
4
figures
4
captioned figures
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!