Extractive text summarization system to aid data extraction from full text in systematic review development.

Duy Duc An Bui Guilherme Del Fiol John F Hurdle Siddhartha Jonnalagadda

J Biomed Inform

Division of Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.

Published: December 2016

Objectives: Extracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process.

Methods: We developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review's study characteristics tables.

Results: At the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p<0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p<0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure.

Conclusion: Computer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5362293	PMC
http://dx.doi.org/10.1016/j.jbi.2016.10.014	DOI Listing

Publication Analysis

Top Keywords

data extraction

text summarization

summarization system

systematic review

review development

computer-generated summaries

data

summaries

extractive text

system aid

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!