Human papillomavirus types 16 and 18 cause the majority of cervical cancers worldwide. Despite the availability of three prophylactic vaccines based on virus-like particles (VLP) of the major capsid protein (L1), these vaccines are unable to clear an existing infection. Such infected persons experience an increased risk of neoplastic transformation.
View Article and Find Full Text PDFTo analyse large corpora using machine learning and other Natural Language Processing (NLP) algorithms, the corpora need to be standardized. The BioC format is a community-driven simple data structure for sharing text and annotations, however there is limited access to biomedical literature in BioC format and a lack of bioinformatics tools to convert online publication HTML formats to BioC. We present Auto-CORPus (Automated pipeline for Consistent Outputs from Research Publications), a novel NLP tool for the standardization and conversion of publication HTML and table image files to three convenient machine-interpretable outputs to support biomedical text analytics.
View Article and Find Full Text PDF