In this paper we illustrate the usage of text mining workflows to automatically extract instances of microorganisms and their habitats from free text; these entries can then be curated and added to different databases. To this end, we use a Conditional Random Field (CRF) based classifier, as part of the workflows, to extract the mention of microorganisms, habitats and the inter-relation between organisms and their habitats. Results indicate a good performance for extraction of microorganisms and the relation extraction aspects of the task (with a precision of over 80%), while habitat recognition is only moderate (a precision of about 65%). We also conjecture that pdf-to-text conversion can be quite noisy and this implicitly affects any sentence-based relation extraction algorithms.

Download full-text PDF

Source
http://dx.doi.org/10.2390/biecoll-jib-2011-184DOI Listing

Publication Analysis

Top Keywords

microorganisms habitats
12
extraction microorganisms
8
habitats free
8
free text
8
text mining
8
mining workflows
8
relation extraction
8
automatic extraction
4
microorganisms
4
habitats
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!