The emergence of "big data" initiatives has led to the need for tools that can automatically extract valuable chemical information from large volumes of unstructured data, such as the scientific literature. Since chemical information can be present in figures, tables, and textual paragraphs, successful information extraction often depends on the ability to interpret all of these domains simultaneously. We present a complete toolkit for the automated extraction of chemical entities and their associated properties, measurements, and relationships from scientific documents that can be used to populate structured chemical databases. Our system provides an extensible, chemistry-aware, natural language processing pipeline for tokenization, part-of-speech tagging, named entity recognition, and phrase parsing. Within this scope, we report improved performance for chemical named entity recognition through the use of unsupervised word clustering based on a massive corpus of chemistry articles. For phrase parsing and information extraction, we present the novel use of multiple rule-based grammars that are tailored for interpreting specific document domains such as textual paragraphs, captions, and tables. We also describe document-level processing to resolve data interdependencies and show that this is particularly necessary for the autogeneration of chemical databases since captions and tables commonly contain chemical identifiers and references that are defined elsewhere in the text. The performance of the toolkit to correctly extract various types of data was evaluated, affording an F-score of 93.4%, 86.8%, and 91.5% for extracting chemical identifiers, spectroscopic attributes, and chemical property attributes, respectively; set against the CHEMDNER chemical name extraction challenge, ChemDataExtractor yields a competitive F-score of 87.8%. All tools have been released under the MIT license and are available to download from http://www.chemdataextractor.org .
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1021/acs.jcim.6b00207 | DOI Listing |
Nanomedicine
January 2025
Center for Research Development and Evaluation of Pharmaceutical Excipients and Generic Drugs, China Pharmaceutical University, Nanjing, China; State Key Laboratory of Nature Medicines, Department of Pharmaceutics, China Pharmaceutical University, Nanjing, China. Electronic address:
Chemosphere
January 2025
Department of Agricultural Machinery Engineering, University of Tehran, Iran.
Soil oil pollution is a major environmental issue, especially in oil-producing nations, as it threatens the health of plants, animals, and humans. While bioremediation has been extensively utilized as a cost-effective method for restoring oil-contaminated soil, its environmental impact has garnered relatively little attention. Researchers often concentrate on reducing pollutant concentrations below permissible limits to restore soil quality.
View Article and Find Full Text PDFCarbohydr Res
January 2025
Postgraduate Program in Health Sciences, Federal University of Sergipe, Aracaju, Sergipe, Brazil; Postgraduate Program in Pharmaceutical Sciences, Federal University of Sergipe, São Cristóvão, Sergipe, Brazil. Electronic address:
Farnesol (FAR) belongs to terpenes group and is a sesquiterpene alcohol and a hydrophobic compound, which can be extracted from natural sources or obtained by organic chemical or biological synthesis. Recent advances in the field of nanotechnology allow the drawbacks of low drug solubility, which can improve the drug therapeutic index. Therefore, this study aimed to prepare the FAR inclusion complexes with β-cyclodextrin (β-CD) and hydroxypropyl-β-cyclodextrin (HP-β-CD) through freeze-drying method, proposing their physicochemical characterization, comparing their toxicity, and evaluating their in vitro antibacterial activity.
View Article and Find Full Text PDFEnviron Sci Pollut Res Int
January 2025
Institute of Environment and Sustainable Development, Banaras Hindu University, Varanasi, 221005, India.
Surface water chemistry of the River Ganga at Varanasi was analyzed at 10 locations over 3 years (2019-2021) across pre-monsoon, monsoon, and post-monsoon seasons. The study aimed to assess water parameters using principal component analysis (PCA), calculate the water quality index (WQI), determine processes governing water chemistry, evaluate irrigation suitability, and estimate non-carcinogenic health risks. The physical parameters measured included pH (8.
View Article and Find Full Text PDFEnviron Sci Technol
January 2025
RECETOX, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czechia.
Access to information about chemicals in products and articles is critical for supporting enforcement of chemical regulations, assessing risks from chemicals, allowing informed consumer choices, and enabling product circularity. In this work, we identified and evaluated available databases (DBs) on chemicals in products and articles from the literature using a defined protocol and from European national market surveillance authorities, nongovernmental agencies, and industrial sector groups using questionnaires. This is the first comprehensive review of DBs that provide information about chemicals in products and articles.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!