A variational heteroencoder based on recurrent neural networks, trained with SMILES linear notations of molecular structures, was used to derive the following atomic descriptors: delta latent space vectors (DLSVs) obtained from the original SMILES of the whole molecule and the SMILES of the same molecule with the target atom replaced. Different replacements were explored, namely, changing the atomic element, replacement with a character of the model vocabulary not used in the training set, or the removal of the target atom from the SMILES. Unsupervised mapping of the DLSV descriptors with t-distributed stochastic neighbor embedding (t-SNE) revealed a remarkable clustering according to the atomic element, hybridization, atomic type, and aromaticity.
View Article and Find Full Text PDFGUIDEMOL is a Python computer program based on the RDKit software to process molecular structures and calculate molecular descriptors with a graphical user interface using the tkinter package. It can calculate descriptors already implemented in RDKit as well as grid representations of 3D molecular structures using the electrostatic potential or voxels. The GUIDEMOL app provides easy access to RDKit tools for chemoinformatics users with no programming skills and can be adapted to calculate other descriptors or to trigger other procedures.
View Article and Find Full Text PDFRandom Forest (RF) QSPR models were developed with a data set of homolytic bond dissociation energies (BDE) previously calculated by B3LYP/6-311++G(d,p)//DFTB for 2263 sp3C-H covalent bonds. The best set of attributes consisted in 114 descriptors of the carbon atom (counts of atom types in 5 spheres around the kernel atom and ring descriptors). The optimized model predicted the DFT-calculated BDE of an independent test set of 224 bonds with MAE=2.
View Article and Find Full Text PDFMachine-learning models were developed to predict the composition profile of a three-compound mixture in liquid-liquid equilibrium (LLE), given the global composition at certain temperature and pressure. A chemoinformatics approach was explored, based on the MOLMAP technology to encode molecules and mixtures. The chemical systems involved an ionic liquid (IL) and two organic molecules.
View Article and Find Full Text PDFMachine learning (ML) algorithms were explored for the classification of the UV-Vis absorption spectrum of organic molecules based on molecular descriptors and fingerprints generated from 2D chemical structures. Training and test data (~ 75 k molecules and associated UV-Vis data) were assembled from a database with lists of experimental absorption maxima. They were labeled with positive class (related to photoreactive potential) if an absorption maximum is reported in the range between 290 and 700 nm (UV/Vis) with molar extinction coefficient (MEC) above 1000 Lmol cm, and as negative if no such a peak is in the list.
View Article and Find Full Text PDFIn this study, machine learning algorithms were investigated for the classification of organic molecules with one carbon chiral center according to the sign of optical rotation. Diverse heterogeneous data sets comprising up to 13,080 compounds and their corresponding optical rotation were retrieved from Reaxys and processed independently for three solvents: dichloromethane, chloroform, and methanol. The molecular structures were represented by chiral descriptors based on the physicochemical and topological properties of ligands attached to the chiral center.
View Article and Find Full Text PDFAiming at generating a series of monoterpene indole alkaloids with enhanced multidrug resistance (MDR) reversing activity in cancer, two major epimeric alkaloids isolated from Tabernaemontana elegans, tabernaemontanine (1) and dregamine (2), were derivatized by alkylation of the indole nitrogen. Twenty-six new derivatives (3-28) were prepared by reaction with different aliphatic and aromatic halides, whose structures were elucidated mainly by NMR, including 2D NMR experiments. Their MDR reversal ability was evaluated through a functional assay, using as models resistant human colon adenocarcinoma and human ABCB1-gene transfected L5178Y mouse lymphoma cells, overexpressing P-glycoprotein (P-gp), by flow cytometry.
View Article and Find Full Text PDFThe increasing application of new ionic liquids (IL) creates the need of liquid-liquid equilibria data for both miscible and quasi-immiscible systems. In this study, equilibrium concentrations at different temperatures for ionic liquid+water two-phase systems were modeled using a Quantitative-Structure-Property Relationship (QSPR) method. Data on equilibrium concentrations were taken from the ILThermo Ionic Liquids database, curated and used to make models that predict the weight fraction of water in ionic liquid rich phase and ionic liquid in the aqueous phase as two separate properties.
View Article and Find Full Text PDFSpectrochim Acta A Mol Biomol Spectrosc
December 2019
A chemoinformatics method was applied to the assignment of absolute configurations and to the quantitative prediction of specific optical rotations using a data set of 88 chiral fluorinated molecules (44 pairs of enantiomers). Counterpropagation neural networks were explored for the classification of enantiomers as dextrorotatory or levorotatory. Regression models were trained using multilayer perceptrons (MLP), random forests (RF) or multilinear regressions (MLR), on the basis of physicochemical atomic stereo (PAS) descriptors.
View Article and Find Full Text PDFA series of π-conjugated molecules, based on pyridazine and thiophene heterocycles ⁻, were synthesized using commercially, or readily available, coupling components, through a palladium catalyzed Suzuki-Miyaura cross-coupling reaction. The electron-deficient pyridazine heterocycle was functionalized by a thiophene electron-rich heterocycle at position six, and different (hetero)aromatic moieties (phenyl, thienyl, furanyl) were functionalized with electron acceptor groups at position three. Density Functional Theory (DFT) calculations were carried out to obtain information on the conformation, electronic structure, electron distribution, dipolar moment, and molecular nonlinear response of the synthesized push-pull pyridazine derivatives.
View Article and Find Full Text PDFMachine learning (ML) algorithms were explored for the fast estimation of molecular dipole moments calculated by density functional theory (DFT) by B3LYP/6-31G(d,p) on the basis of molecular descriptors generated from DFT-optimized geometries and partial atomic charges obtained by empirical or ML schemes. A database was used with 10,071 structures, new molecular descriptors were designed and the models were validated with external test sets. Several ML algorithms were screened.
View Article and Find Full Text PDFComputational methodologies are assisting the exploration of marine natural products (MNPs) to make the discovery of new leads more efficient, to repurpose known MNPs, to target new metabolites on the basis of genome analysis, to reveal mechanisms of action, and to optimize leads. In silico efforts in drug discovery of NPs have mainly focused on two tasks: dereplication and prediction of bioactivities. The exploration of new chemical spaces and the application of predicted spectral data must be included in new approaches to select species, extracts, and growth conditions with maximum probabilities of medicinal chemistry novelty.
View Article and Find Full Text PDFSummary: The representation of metabolic reactions strongly relies on visualization, which is a major barrier for blind users. The NavMol software renders the communication and interpretation of molecular structures and reactions accessible by integrating chemoinformatics and assistive technology. NavMol 3.
View Article and Find Full Text PDFBackground: Tuberculosis (TB) is the second leading cause of mortality worldwide being a highly contagious and insidious illness caused by Mycobacterium tuberculosis, Mtb. Additionally, the emergence of multidrug-resistant and extensively drug-resistant strains of Mtb, together with significant levels of co-infection with HIV and TB (HIV/TB) make the search for new antitubercular drugs urgent and challenging.
Methods: This work was based on the hypothesis that an active compound could be obtained if substituents present in some other active compounds were attached on a core of an important structure, in this case the indole scaffold, thus generating a hybrid compound.
Machine learning algorithms were explored for the fast estimation of HOMO and LUMO orbital energies calculated by DFT B3LYP, on the basis of molecular descriptors exclusively based on connectivity. The whole project involved the retrieval and generation of molecular structures, quantum chemical calculations for a database with >111 000 structures, development of new molecular descriptors, and training/validation of machine learning models. Several machine learning algorithms were screened, and an applicability domain was defined based on Euclidean distances to the training set.
View Article and Find Full Text PDFTo enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function.
View Article and Find Full Text PDFThe disturbing emergence of multidrug-resistant strains of Mycobacterium tuberculosis (Mtb) has been driving the scientific community to urgently search for new and efficient antitubercular drugs. Despite the various drugs currently under evaluation, isoniazid is still the key and most effective component in all multi-therapeutic regimens recommended by the WHO. This paper describes the QSAR-oriented design, synthesis and in vitro antitubercular activity of several potent isoniazid derivatives (isonicotinoyl hydrazones and isonicotinoyl hydrazides) against H37Rv and two resistant Mtb strains.
View Article and Find Full Text PDFThe combination of chemoinformatics approaches with NMR techniques and the increasing availability of data allow the resolution of problems far beyond the original application of NMR in structure elucidation/verification. The diversity of applications can range from process monitoring, metabolic profiling, authentication of products, to quality control. An application related to the automatic analysis of complex mixtures concerns mixtures of chemical reactions.
View Article and Find Full Text PDFBackground: The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g.
View Article and Find Full Text PDFMachine learning (SVM and JRip rule learner) methods have been used in conjunction with the Condensed Graph of Reaction (CGR) approach to identify errors in the atom-to-atom mapping of chemical reactions produced by an automated mapping tool by ChemAxon. The modeling has been performed on the three first enzymatic classes of metabolic reactions from the KEGG database. Each reaction has been converted into a CGR representing a pseudomolecule with conventional (single, double, aromatic, etc.
View Article and Find Full Text PDFMetabolic pathways are at the crossroad between the chemical world of small molecules and the biological world of enzymes, genes and regulation. Methods for their processing are therefore required for a great variety of applications. The work presented here reports a new method to encode metabolic pathways and reactomes of organisms based on the MOLMAP approach.
View Article and Find Full Text PDFQuantitative structure-property relationships (QSPRs) were investigated for the estimation of the Mayr electrophilicity parameter using a data set of 64 compounds, all currently available uncharged electrophiles in Mayr's Database of Reactivity Parameters. Three collections of empirical descriptors were employed, from Dragon, Adriana.Code, and CDK.
View Article and Find Full Text PDFThe Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records.
View Article and Find Full Text PDFThe automatic perception of chemical similarities between chemical reactions is required for a variety of applications in chemistry and connected fields, namely with databases of metabolic reactions. Classification of enzymatic reactions is required, e.g.
View Article and Find Full Text PDFThe MOLMAP descriptor relies on a Kohonen SOM that defines types of covalent bonds on the basis of their physicochemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants and numerically encodes the pattern of changes in bonds during a chemical reaction.
View Article and Find Full Text PDF