Solr-Plant: efficient extraction of plant names from text.

BMC Bioinformatics

Center for Biomedical Informatics, Brown University, Box G-R, Providence, RI, USA.

Published: May 2019

Background: The retrieval of plant-related information is a challenging task due to variations in species name mentions as well as spelling or typographical errors across data sources. Scalable solutions are needed for identifying plant name mentions from text and resolving them to accepted taxonomic names.

Results: An Apache Solr-based fuzzy matching system enhanced with the Smith-Waterman alignment algorithm ("Solr-Plant") was developed for mapping and resolution to a plant name and synonym thesaurus. Evaluation of Solr-Plant suggests promising results in terms of both accuracy and processing efficiency on misspelled species names from two benchmark datasets: (1) SALVIAS and (2) National Center for Biotechnology Information (NCBI) Taxonomy. Additional evaluation using S800 text corpus also reflects high precision and recall. The latest version of the source code is available at https://github.com/bcbi/SolrPlantAPI . A REST-compliant web interface and service for Solr-Plant is hosted at http://bcbi.brown.edu/solrplant .

Conclusion: Automated techniques are needed for efficient and accurate identification of knowledge linked with biological scientific names. Solr-Plant complements the current state-of-the-art in terms of both efficiency and accuracy in identification of names restricted at species level. The approach can be extended to identify broader groups of organisms at different taxonomic levels. The results reflect potential utility of Solr-Plant as a data mining tool for extracting and correcting plant species names.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6530169PMC
http://dx.doi.org/10.1186/s12859-019-2874-6DOI Listing

Publication Analysis

Top Keywords

species names
8
solr-plant
5
names
5
solr-plant efficient
4
efficient extraction
4
plant
4
extraction plant
4
plant names
4
names text
4
text background
4

Similar Publications

Bacterial endospores are ubiquitous and are responsible for various human infections. Recently, we reported that an ionic liquid (IL)-based sample preparation method (named pTRUST) facilitated highly efficient shotgun analysis of the Bacillus subtilis spore proteome in trace samples. In this study, we evaluated the efficiency and applicability of the pTRUST technology using three different spore preparations: one purified from the closely related subspecies B.

View Article and Find Full Text PDF

Amidst the global challenge of extreme poverty, the livestock sector can significantly contribute to global sustainable development goals by enhancing resilience, smallholder productivity, and market participation. The Indian livestock sector is one of the largest in the world with a total livestock population of 535.82 million, ∼10.

View Article and Find Full Text PDF

Biological Characteristics and Whole-Genome Analysis of a Porcine Phage.

Vet Sci

January 2025

College of Animal Science and Technology, Shihezi University, Shihezi 832003, China.

(1) Background: In recent years, the increasing emergence of multidrug-resistant pathogens in pig farms has begun to pose a severe threat to animal welfare and, by extension, public health. In this study, we aimed to explore the biological characteristics and genomic features of bacteriophages that are capable of lysing porcine multidrug-resistant , which was isolated from sewage. In doing so, we provided a reference for phage therapies that can be used to treat multidrug-resistant strains.

View Article and Find Full Text PDF

Chemical Changes Under Heat Stress and Identification of Dendrillolactone, a New Diterpene Derivative with a Rare Rearranged Spongiane Skeleton from the Antarctic Marine Sponge .

Mar Drugs

December 2024

Consiglio Nazionale delle Ricerche (CNR), Istituto di Chimica Biomolecolare (ICB), Via Campi Flegrei 34, 80078 Pozzuoli, Napoli, Italy.

The waters around the western Antarctic Peninsula are experiencing fast warming due to global change, being among the most affected regions on the planet. This polar area is home to a large and rich community of benthic marine invertebrates, such as sponges, tunicates, corals, and many other animals. Among the sponges, the bright yellow is commonly known for using secondary diterpenoids as a defensive mechanism against local potential predators.

View Article and Find Full Text PDF

is an emerging multidrug-resistant fungal pathogen causing nosocomial transmission and invasive infections with high mortality. This study aimed to investigate the genetic relationships, enzymatic activities, and drug-resistance profiles of isolates to evaluate the population and epidemiological diversity of candidiasis in Russia. A total of 112 clinical isolates of were analyzed from May 2017 to March 2023 in 18 hospitals across Saint Petersburg, the Leningrad Region, and Moscow.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!