ReactionDataExtractor 2.0: A Deep Learning Approach for Data Extraction from Chemical Reaction Schemes.

J Chem Inf Model

Cavendish Laboratory, Department of Physics, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, U.K.

Published: October 2023

Knowledge in the chemical domain is often disseminated graphically via chemical reaction schemes. The task of describing chemical transformations is greatly simplified by introducing reaction schemes that are composed of chemical diagrams and symbols. While intuitively understood by any chemist, like most graphical representations, such drawings are not easily understood by machines; this poses a challenge in the context of data extraction. Currently available tools are limited in their scope of extraction and require manual preprocessing, thus slowing down the speed of data extraction. We present a new tool, ReactionDataExtractor v2.0, which uses a combination of neural networks and symbolic artificial intelligence to effectively remove this barrier. We have evaluated our tool on a test set composed of reaction schemes that were taken from open-source journal articles and realized F1 score metrics between 75 and 96%. These evaluation metrics can be further improved by tuning our object-detection models to a specific chemical subdomain thanks to a data-driven approach that we have adopted with synthetically generated data. The system architecture of our tool is modular, which allows it to balance speed and accuracy to afford an autonomous, high-throughput solution for image-based chemical data extraction.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10565829PMC
http://dx.doi.org/10.1021/acs.jcim.3c00422DOI Listing

Publication Analysis

Top Keywords

data extraction
16
reaction schemes
16
chemical reaction
8
chemical
7
data
5
extraction
5
reactiondataextractor deep
4
deep learning
4
learning approach
4
approach data
4

Similar Publications

Nursing activity recognition has immense importance in the development of smart healthcare management and is an extremely challenging area of research in human activity recognition. The main reasons are an extreme class-imbalance problem and intra-class variability depending on both the subject and the recipient. In this paper, we apply a unique two-step feature extraction, coupled with an intermediate feature 'Angle' and a new feature called mean min max sum to render the features robust against intra-class variation.

View Article and Find Full Text PDF

Triphala is a traditional Ayurvedic herbal formulation composed of three fruits: amla (Phyllanthus emblica), bibhitaki (Terminalia bellerica), and haritaki (Terminalia chebula). Triphala is a potent Ayurvedic remedy that promotes digestion, detoxification, and overall wellness, while also providing antioxidant benefits through its trio of nutrient-rich fruits. In order to elucidate the individual contributions of the three ingredients of Triphala from molecular perspective, the individual ingredients were used for the untargeted LCMS/MS analysis.

View Article and Find Full Text PDF

Robust multi-source geographic entities matching by maximizing geometric and semantic similarity.

Sci Rep

December 2024

Department of Geographic Information System, Chinese Academy of Surveying and mapping, Beijing, 100036, China.

Geographic entity matching is an important means for multi-source spatial data fusion and information association and sharing. Corresponding matching methods have been designed by existing studies for different types of entity data characteristics, such as line and area. However, these approaches are often limited in the generalization ability for matching heterogeneous data from multiple sources and the accuracy for complex pattern matching.

View Article and Find Full Text PDF

Evaluating the effectiveness of cancer treatments in relation to specific tumor mutations is essential for improving patient outcomes and advancing the field of precision medicine. Here we represent a comprehensive analysis of 78,287 U.S.

View Article and Find Full Text PDF

tumour specific surgery in colon cancer is gaining popularity among colorectal surgeons. Many advocate adapting surgical technique based on preoperative CT staging as not all patients require complete mesocolic excision (CME) and D3 lymphadenectomy. We aimed to assess the sensitivity and specificity of preoperative CT scans in nodal staging and analyse whether inadequate CT staging could have influenced local recurrences.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!