Fingerprint methods applied to molecules have proven to be useful for similarity determination and as inputs to machine-learning models. Here, we present the development of a new fingerprint for chemical reactions and validate its usefulness in building machine-learning models and in similarity assessment. Our final fingerprint is constructed as the difference of the atom-pair fingerprints of products and reactants and includes agents via calculated physicochemical properties. We validated the fingerprints on a large data set of reactions text-mined from granted United States patents from the last 40 years that have been classified using a substructure-based expert system. We applied machine learning to build a 50-class predictive model for reaction-type classification that correctly predicts 97% of the reactions in an external test set. Impressive accuracies were also observed when applying the classifier to reactions from an in-house electronic laboratory notebook. The performance of the novel fingerprint for assessing reaction similarity was evaluated by a cluster analysis that recovered 48 out of 50 of the reaction classes with a median F-score of 0.63 for the clusters. The data sets used for training and primary validation as well as all python scripts required to reproduce the analysis are provided in the Supporting Information.

Download full-text PDF

Source
http://dx.doi.org/10.1021/ci5006614DOI Listing

Publication Analysis

Top Keywords

novel fingerprint
8
fingerprint chemical
8
chemical reactions
8
machine-learning models
8
fingerprint
5
reactions
5
development novel
4
reactions application
4
application large-scale
4
large-scale reaction
4

Similar Publications

Latent fingerprints (LFPs) are invisible impressions that need to be developed before being used for criminal investigation; however, existing fingerprint visualization techniques face challenges, such as complex preparation and poor contrast. To advance practical fingerprint detection, green-emissive micron-sized curcumin/kaolin composites were synthesized a facile and cost-effective one-step physical cross-linking method, which exhibited unprecedented performance in developing diversified marks, including LFPs, knuckle prints, palm prints, and footprints, with clear three-level details on various substrates. Notably, the powders successfully developed LFPs that were aged for 30 days and even up to 100 days, meeting the stringent requirements for comprehensive forensic application.

View Article and Find Full Text PDF

Analysis strategy of contamination source using chemical fingerprint information based on GC-HRMS: A case study of landfill leachate.

Water Res

December 2024

College of Environment, Ministry of Education Key Laboratory of Integrated Regulation and Resource Development on Shallow Lakes, Hohai University, Nanjing 210098, PR China.; Suzhou Research Institute, Hohai University, Suzhou 215100, PR China.. Electronic address:

With the increasing prevalence of emerging contaminants (ECs) in the environment, gaining a deeper understanding of the chemical information pertaining to the contamination source is a crucial step toward effective prevention and control of these ECs. This study presents a novel strategy for analyzing the chemical information of contamination sources using gas chromatography-high resolution mass spectrometry (GC-HRMS) and demonstrates it on landfill leachate, a common and representative environmental contamination source. Initially, a non-targeted screening approach using HRMS was used to characterize a total of 5344 organic compounds with identification confidence levels 1 and 2 in 14 landfill leachate samples.

View Article and Find Full Text PDF

Fatty acid (FA), tocopherol, and phytosterol profiles are used in avocado oil purity standards. However, blends with other oils can mimic the profile of pure avocado oil, resulting in similar ranges for these molecules. Therefore, fatty alcohol esters (FAEs) uniquely of spp.

View Article and Find Full Text PDF
Article Synopsis
  • MR fingerprinting (MRF) is an innovative technique for measuring MR relaxometry with high precision, but its complex data requirements hinder its widespread use.
  • A deep learning (DL) network, specifically a U-Net, was created to synthesize MRF signals from regular magnitude-only MRI data collected from 37 volunteers, comparing the results with actual acquired MRF signals.
  • The study found strong concordance between synthesized and actual MRF data, indicating that DL can enable quantitative relaxometry without the need for specialized MRF pulse sequences.
View Article and Find Full Text PDF

Machine-learning crystal size distribution for volcanic stratigraphy correlation.

Sci Rep

December 2024

Centre for Ore Deposit and Earth Sciences, School of Natural Sciences, University of Tasmania, Hobart, Australia.

Volcanic stratigraphy reconstruction is traditionally based on qualitative facies analysis complemented by geochemical analyses. Here we present a novel technique based on machine learning identification of crystal size distribution to quantitatively fingerprint lavas, shallow intrusions and coarse lava breccias. This technique, based on a simple photograph of a rock (or core) sample, is complementary to existing methods and allows another strategy to identify and compare volcanic rocks for stratigraphic correlation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!