Publications by Gully Apc Burns

Publications by authors named "Gully Apc Burns"

Page 1 of 1

Layout-aware text extraction from full-text PDF of scientific articles.

Cartic Ramakrishnan Abhishek Patnia Eduard Hovy Gully Apc Burns

Source Code Biol Med

May 2012

Background: The Portable Document Format (PDF) is the most commonly used file format for online scientific publications. The absence of effective means to extract text from these PDF files in a layout-aware manner presents a significant challenge for developers of biomedical text mining or biocuration informatics systems that use published literature as an information source. In this paper we introduce the 'Layout-Aware PDF Text Extraction' (LA-PDFText) system to facilitate accurate extraction of text from PDF files of research articles for use in text mining applications.

View Article and Find Full Text PDF