This work constructs an advanced force field, the Completely Multipolar Model (CMM), to quantitatively reproduce each term of an energy decomposition analysis (EDA) for aqueous solvated alkali metal cations and halide anions and their ion pairings. We find that all individual EDA terms remain well-approximated in the CMM for ion-water and ion-ion interactions, except for polarization, which shows errors due to the partial covalency of ion interactions near their equilibrium. We quantify the onset of the dative bonding regime by examining the change in molecular polarizability and Mayer bond indices as a function of distance, showing that partial covalency manifests by breaking the symmetry of atomic polarizabilities while strongly damping them at short-range.
View Article and Find Full Text PDFEnergy decomposition analysis (EDA) based on density functional theory (DFT) and self-consistent field (SCF) calculations has become widely used for understanding intermolecular interactions. This work reports a new approach to EDA for post-SCF wave functions based on closed-shell restricted second-order Mo̷ller-Plesset (MP2) together with an efficient implementation that generalizes the successful SCF-level second-generation absolutely localized molecular orbital EDA approach, ALMO-EDA-II, and improves upon MP2 ALMO-EDA-I. The new MP2 ALMO-EDA-II provides distinct energy contributions for a frozen interaction energy containing permanent electrostatics and Pauli repulsions, polarized energy-yielding induced electrostatics, dispersion-corrected energy, and the fully relaxed energy, which describes charge transfer.
View Article and Find Full Text PDFChemical shifts are a readily obtainable NMR observable that can be measured with high accuracy, and because they are sensitive to conformational averages and the local molecular environment, they yield detailed information about protein structure in solution. To predict chemical shifts of protein structures, we introduced the UCBShift method that uniquely fuses a transfer prediction module, which employs sequence and structure alignments to select reference chemical shifts from an experimental database, with a machine learning model that uses carefully curated and physics-inspired features derived from X-ray crystal structures to predict backbone chemical shifts for proteins. In this work, we extend the UCBShift 1.
View Article and Find Full Text PDFPhys Chem Chem Phys
November 2024
Identification of the breaking point for the chemical bond is essential for our understanding of chemical reactivity. The current consensus is that a point of maximal electron delocalization along the bonding axis separates the different bonding regimes of reactants and products. This maximum transition point has been investigated previously through the total position spread and the bond-parallel components of the static polarizability tensor for describing covalent bond breaking.
View Article and Find Full Text PDFIdentifying transition states-saddle points on the potential energy surface connecting reactant and product minima-is central to predicting kinetic barriers and understanding chemical reaction mechanisms. In this work, we train a fully differentiable equivariant neural network potential, NewtonNet, on thousands of organic reactions and derive the analytical Hessians. By reducing the computational cost by several orders of magnitude relative to the density functional theory (DFT) ab initio source, we can afford to use the learned Hessians at every step for the saddle point optimizations.
View Article and Find Full Text PDFWe introduce a general framework for many-body force fields, the Completely Multipolar Model (CMM), that utilizes multipolar electrical moments modulated by exponential decay of electron density as a common functional form for all terms of an energy decomposition analysis of intermolecular interactions. With this common functional form, the CMM model establishes well-formulated damped tensors that reach the correct asymptotes at both long- and short-range while formally ensuring no short-range catastrophes. CMM describes the separable EDA terms of dispersion, exchange polarization, and Pauli repulsion with short-ranged anisotropy, polarization as intramolecular charge fluctuations and induced dipoles, while charge transfer describes explicit movement of charge between molecules, and naturally describes many-body charge transfer by coupling into the polarization equations.
View Article and Find Full Text PDFIn 1999 Wright and Dyson highlighted the fact that large sections of the proteome of all organisms are comprised of protein sequences that lack globular folded structures under physiological conditions. Since then the biophysics community has made significant strides in unraveling the intricate structural and dynamic characteristics of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). Unlike crystallographic beamlines and their role in streamlining acquisition of structures for folded proteins, an integrated experimental and computational approach aimed at IDPs/IDRs has emerged.
View Article and Find Full Text PDFMetal-organic cages form well-defined microenvironments that can enhance the catalytic proficiency of encapsulated transition metal complexes (TMCs). We introduce a screening protocol to efficiently identify TMCs that are promising candidates for encapsulation in the GaL nanocage. We obtain TMCs from the Cambridge Structural Database with geometric and electronic characteristics amenable to encapsulation and mine the text of associated manuscripts to curate TMCs with documented catalytic functionality.
View Article and Find Full Text PDFMotivation: Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However, much of protein function is modulated beyond the translated sequence through the introduction of post-translational modifications (PTMs).
Results: In this work, we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains.
Water is often the testing ground for new, advanced force fields. While advanced functional forms for intermolecular interactions have been integral to the development of accurate water models, less attention has been paid to a transferable model for intramolecular valence terms. In this work, we present a one-body energy and dipole moment surface model, named 1B-UCB, that is simple yet accurate and can be feasibly adapted for both standard and advanced potentials.
View Article and Find Full Text PDFDetermining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site.
View Article and Find Full Text PDFSidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However much of protein function is modulated beyond the translated sequence through thFiguree introduction of post-translational modifications (PTMs). In this work we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains.
View Article and Find Full Text PDFIn charged water microdroplets, which occur in nature or in the lab upon ultrasonication or in electrospray processes, the thermodynamics for reactive chemistry can be dramatically altered relative to the bulk phase. Here, we provide a theoretical basis for the observation of accelerated chemistry by simulating water droplets of increasing charge imbalance to create redox agents such as hydroxyl and hydrogen radicals and solvated electrons. We compute the hydration enthalpy of OH and H that controls the electron transfer process, and the corresponding changes in vertical ionization energy and vertical electron affinity of the ions, to create OH and H reactive species.
View Article and Find Full Text PDFWe train an equivariant machine learning (ML) model to predict energies and forces for hydrogen combustion under conditions of finite temperature and pressure. This challenging case for reactive chemistry illustrates that ML potential energy surfaces are difficult to make complete, due to overreliance on chemical intuition of what data are important for training. Instead, a 'negative design' data acquisition strategy using metadynamics as part of an active learning workflow helps to create a ML model that avoids unforeseen high-energy or unphysical energy configurations.
View Article and Find Full Text PDFJ Phys Chem Lett
December 2023
The Raman spectrum of liquid water is quite complex, reflecting its strong sensitivity to the local environment of the individual waters. The OH-stretch region of the spectrum, which captures the influence of hydrogen bonding, has only just begun to be unraveled. Here we develop a model for predicting the Raman spectra of the OH-stretch region by considering how local electric fields distort the energy surface of each water monomer.
View Article and Find Full Text PDFSummary: The Local Disordered Region Sampling (LDRS, pronounced loaders) tool is a new module developed for IDPConformerGenerator, a previously validated approach to model intrinsically disordered proteins (IDPs). The IDPConformerGenerator LDRS module provides a method for generating all-atom conformations of intrinsically disordered protein regions at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures.
View Article and Find Full Text PDFWe leveraged the power of ChatGPT and Bayesian optimization in the development of a multi-AI-driven system, backed by seven large language model-based assistants and equipped with machine learning algorithms, that seamlessly orchestrates a multitude of research aspects in a chemistry laboratory (termed the ChatGPT Research Group). Our approach accelerated the discovery of optimal microwave synthesis conditions, enhancing the crystallinity of MOF-321, MOF-322, and COF-323 and achieving the desired porosity and water capacity. In this system, human researchers gained assistance from these diverse AI collaborators, each with a unique role within the laboratory environment, spanning strategy planning, literature search, coding, robotic operation, labware design, safety inspection, and data analysis.
View Article and Find Full Text PDFAccurate potential energy models of proteins must describe the many different types of noncovalent interactions that contribute to a protein's stability and structure. Pi-pi contacts are ubiquitous structural motifs in all proteins, occurring between aromatic and nonaromatic residues and play a nontrivial role in protein folding and in the formation of biomolecular condensates. Guided by a geometric criterion for isolating pi-pi contacts from classical molecular dynamics simulations of proteins, we use quantum mechanical energy decomposition analysis to determine the molecular interactions that stabilize different pi-pi contact motifs.
View Article and Find Full Text PDFWe present an investigation into the transferability of pseudopotentials (PPs) with a nonlinear core correction (NLCC) using the Goedecker, Teter, and Hutter (GTH) protocol across a range of pure GGA, meta-GGA and hybrid functionals, and their impact on thermochemical and non-thermochemical properties. The GTH-NLCC PP for the PBE density functional demonstrates remarkable transferability to the PBE0 and ωB97X-V exchange-correlation functionals, and relative to no NLCC, improves agreement significantly for thermochemical benchmarks compared to all-electron calculations. On the other hand, the B97M-rV meta-GGA functional performs poorly with the PBE-derived GTH-NLCC PP, which is mitigated by reoptimizing the NLCC parameters for this specific functional.
View Article and Find Full Text PDFThe rates of many chemical reactions are accelerated when carried out in micron-sized droplets, but the molecular origin of the rate acceleration remains unclear. One example is the condensation reaction of 1,2-diaminobenzene with formic acid to yield benzimidazole. The observed rate enhancements have been rationalized by invoking enhanced acidity at the surface of methanol solvent droplets with low water content to enable protonation of formic acid to generate a cationic species (protonated formic acid or PFA) formed by attachment of a proton to the neutral acid.
View Article and Find Full Text PDFMany physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage, defined as proteins and ligands with high sequence and structural similarity.
View Article and Find Full Text PDFWe use local diffusion maps to assess the quality of two types of collective variables (CVs) for a recently published hydrogen combustion benchmark dataset that contains ab initio molecular dynamics (MD) trajectories and normal modes along minimum energy paths. This approach was recently advocated in for assessing CVs and analyzing reactions modeled by classical MD simulations. We report the effectiveness of this approach to molecular systems modeled by quantum ab initio MD.
View Article and Find Full Text PDFThe Local Disordered Region Sampling (LDRS, pronounced ) tool, developed for the IDPConformerGenerator platform (Teixeira 2022), provides a method for generating all-atom conformations of intrinsically disordered regions (IDRs) at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB structure of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein.
View Article and Find Full Text PDFWe present a new software package called M-Chem that is designed from scratch in C++ and parallelized on shared-memory multi-core architectures to facilitate efficient molecular simulations. Currently, M-Chem is a fast molecular dynamics (MD) engine that supports the evaluation of energies and forces from two-body to many-body all-atom potentials, reactive force fields, coarse-grained models, combined quantum mechanics molecular mechanics (QM/MM) models, and external force drivers from machine learning, augmented by algorithms that are focused on gains in computational simulation times. M-Chem also includes a range of standard simulation capabilities including thermostats, barostats, multi-timestepping, and periodic cells, as well as newer methods such as fast extended Lagrangians and high quality electrostatic potential generation.
View Article and Find Full Text PDF