With the advent of artificial intelligence (AI), it is now possible to design diverse and novel molecules from previously unexplored chemical space. However, a challenge for chemists is the synthesis of such molecules. Recently, there have been attempts to develop AI models for retrosynthesis prediction, which rely on the availability of a high-quality training dataset. In this work, we explore the suitability of large language models (LLMs) for extraction of high-quality chemical reaction data from patent documents. A comparative study on the same set of patents from an earlier study showed that the proposed automated approach can enhance the current datasets by addition of 26% new reactions. Several challenges were identified during reaction mining, and for some of them alternative solutions were proposed. A detailed analysis was also performed wherein several wrong entries were identified in the previously curated dataset. Reactions extracted using the proposed pipeline over a larger patent dataset can improve the accuracy and efficiency of synthesis prediction models in future.Scientific contributionIn this work we evaluated the suitability of large language models for mining a high-quality chemical reaction dataset from patent literature. We showed that the proposed approach can significantly improve the quantity of the reaction database by identifying more chemical reactions and improve the quality of the reaction database by correcting previous errors/false positives.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590295 | PMC |
http://dx.doi.org/10.1186/s13321-024-00928-8 | DOI Listing |
J Comput Chem
January 2025
Department of Chemistry, University of Nevada Reno, Reno, Nevada, USA.
Hydrogen gas (H) can be produced via entirely solar-driven photocatalytic water splitting (PWS). A promising set of organic materials for facilitating PWS are the so-called inverted singlet-triplet, INVEST, materials. Inversion of the singlet (S) and triplet (T) energies reduces the population of triplet states, which are otherwise destructive under photocatalytic conditions.
View Article and Find Full Text PDFData Brief
December 2024
Department of Computer Systems Engineering, Faculty of Information and Communication Technology, Tshwane University of Technology, South Africa.
Solar energy has become the fastest growing renewable and alternative source of energy. However, there is little or no open-source datasets to advance research knowledge in photovoltaic related systems. The work presented in this article is a step towards deriving Photo-Voltaic Module Dataset (PVMD) of thermal images and ensuring they are publicly available.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
December 2024
Department of Physics, RTM Nagpur University, Nagpur 440033, India.
While searching for a new host suitable for near infrared (NIR) emission, we explored a new composition NaLaMgWO. The samples were prepared by solid state reaction method. X-ray Diffraction confirms crystallization of NaLaMgWO in monoclinic system.
View Article and Find Full Text PDFPol J Vet Sci
September 2024
Department of Companion Animals and Horses, University Equine Hospital, Vetmeduni Vienna, Vienna, Austria.
Rhodococcus equi (R. equi) is a primary cause of pyogranulomatous pneumonia of foals between three weeks and five months of age. Early diagnosis of rhodococcal pneumonia has always been considered a preferable approach as it can lead to more successful treatment and better outcomes.
View Article and Find Full Text PDFChem Biomed Imaging
December 2024
Department of Chemistry "G.Ciamician", University of Bologna, UE4, Via. P. Gobetti 85, 40129 Bologna, Italy.
Electrochemiluminescence (ECL) is nowadays a powerful technique widely used in biosensing and imaging, offering high sensitivity and specificity for detecting and mapping biomolecules. Screen-printed electrodes (SPEs) offer a versatile and cost-effective platform for ECL applications due to their ease of fabrication, disposability, and suitability for large-scale production. This research introduces a novel method for improving the ECL characteristics of screen-printed carbon electrodes (SPCEs) through the application of CO laser treatment following fabrication.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!