The growing amount of experimental data on chemical objects includes properties of small molecules, results of studies of their interaction with human and animal proteins, and methods of synthesis of organic compounds (OCs). The data obtained can be used to identify the names of OCs automatically, including all possible synonyms and relevant data on the molecular properties and biological activity. Utilization of different synonymic names of chemical compounds allows researchers to increase the completeness of data on their properties available from publications. Enrichment of the data on the names of chemical compounds by information about their possible metabolites can help estimate the biological effects of parent compounds and their metabolites more thoroughly. Therefore, an attempt at automated extraction of the names of parent compounds and their metabolites from the texts is a rather important task. In our study, we aimed at developing a method that provides the extraction of the named entities (NEs) of parent compounds and their metabolites from abstracts of scientific publications. Based on the application of the conditional random fields' algorithm, we extracted the NEs of chemical compounds. We developed a set of rules allowing identification of parent compound NEs and their metabolites in the texts. We evaluated the possibility of extracting the names of potential metabolites based on cosine similarity between strings representing names of parent compounds and all other chemical NEs found in the text. Additionally, we used conditional random fields to fetch the names of parent compounds and their metabolites from the texts based on the corpus of texts labeled manually. Our computational experiments showed that usage of rules in combination with cosine similarity could increase the accuracy of recognition of the names of metabolites compared to the rule-based algorithm and application of a machine-learning algorithm (conditional random fields).

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jcim.0c01054DOI Listing

Publication Analysis

Top Keywords

parent compounds
24
compounds metabolites
24
metabolites texts
16
chemical compounds
12
names parent
12
conditional random
12
compounds
10
metabolites
9
names
8
names chemical
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!