The solubility of chemical substances in water is a critical parameter in pharmaceutical development, environmental chemistry, agrochemistry, and other fields; however, accurately predicting it remains a challenge. This study aims to evaluate and compare the effectiveness of some of the most popular machine learning modeling methods and molecular featurization techniques in predicting aqueous solubility. Although these methods were not implemented in a competitive environment, some of their performance surpassed previous benchmarks, offering gradual but significant improvements.
View Article and Find Full Text PDFBayesian networks represent a useful tool to explore interactions within biological systems. The aims of this study were to identify a reduced number of genes associated with a stress condition in chickens (Gallus gallus) and to unravel their interactions by implementing a Bayesian network approach. Initially, one publicly available dataset (3 control vs.
View Article and Find Full Text PDFHeavy-isotope substitution into enzymes slows down bond vibrations and may alter transition-state barrier crossing probability if this is coupled to fast protein motions. ATP phosphoribosyltransferase from Acinetobacter baumannii is a multi-protein complex where the regulatory protein HisZ allosterically enhances catalysis by the catalytic protein HisG. This is accompanied by a shift in rate-limiting step from chemistry to product release.
View Article and Find Full Text PDFThe first step of histidine biosynthesis in , the condensation of ATP and 5-phospho-α-d-ribosyl-1-pyrophosphate to produce -(5-phospho-β-d-ribosyl)-ATP (PRATP) and pyrophosphate, is catalyzed by the hetero-octameric enzyme ATP phosphoribosyltransferase, a promising target for antibiotic design. The catalytic subunit, HisG, is allosterically activated upon binding of the regulatory subunit, HisZ, to form the hetero-octameric holoenzyme (ATPPRT), leading to a large increase in . Here, we present the crystal structure of ATPPRT, along with kinetic investigations of the rate-limiting steps governing catalysis in the nonactivated (HisG) and activated (ATPPRT) forms of the enzyme.
View Article and Find Full Text PDFIs-PETase has become an enzyme of significant interest due to its ability to catalyse the degradation of polyethylene terephthalate (PET) at mesophilic temperatures. We performed hybrid quantum mechanics and molecular mechanics (QM/MM) at the DSD-PBEP86-D3/ma-def2-TZVP/CHARMM27//rev-PBE-D3/dev2-SVP/CHARMM level to calculate the energy profile for the degradation of a suitable PET model by this enzyme. Very low overall barriers are computed for serine protease-type hydrolysis steps (as low as 34.
View Article and Find Full Text PDFIn this paper we examine the emergent structures of random networks that have undergone bond percolation an arbitrary, but finite, number of times. We define two types of sequential branching processes: a competitive branching process, in which each iteration performs bond percolation on the residual graph (RG) resulting from previous generations, and a collaborative branching process, where percolation is performed on the giant connected component (GCC) instead. We investigate the behavior of these models, including the expected size of the GCC for a given generation, the critical percolation probability, and other topological properties of the resulting graph structures using the analytically exact method of generating functions.
View Article and Find Full Text PDFBackground: Relationships among genetic or epigenetic features can be explored by learning probabilistic networks and unravelling the dependencies among a set of given genetic/epigenetic features. Bayesian networks (BNs) consist of nodes that represent the variables and arcs that represent the probabilistic relationships between the variables. However, practical guidance on how to make choices among the wide array of possibilities in Bayesian network analysis is limited.
View Article and Find Full Text PDFCorrelations among the degrees of vertices in random graphs often occur when clustering is present. In this paper we define a joint-degree correlation function for vertices in the giant component of clustered configuration model networks which are composed of clique subgraphs. We use this model to investigate, in detail, the organization among nearest-neighbor subgraphs for random graphs as a function of subgraph topology as well as clustering.
View Article and Find Full Text PDFDifferences in the expression patterns of genes have been used to measure the effects of non-stress or stress conditions in poultry species. However, the list of genes identified can be extensive and they might be related to several biological systems. Therefore, the aim of this study was to identify a small set of genes closely associated with stress in a poultry animal model, the chicken (Gallus gallus), by reusing and combining data previously published together with bioinformatic analysis and Bayesian networks in a multi-step approach.
View Article and Find Full Text PDFATP phosphoribosyltransferase (ATPPRT) catalyzes the first step of histidine biosynthesis in bacteria, namely, the condensation of ATP and 5-phospho-α-d-ribosyl-1-pyrophosphate (PRPP) to generate -(5-phospho-β-d-ribosyl)-ATP (PRATP) and pyrophosphate. Catalytic (HisG) and regulatory (HisZ) subunits assemble in a hetero-octamer where HisZ activates HisG and mediates allosteric inhibition by histidine. In , HisG is necessary for the bacterium to persist in the lung during pneumonia.
View Article and Find Full Text PDFIn this paper we introduce a description of the equilibrium state of a bond percolation process on random graphs using the exact method of generating functions. This allows us to find the expected size of the giant connected component (GCC) of two sequential bond percolation processes in which the bond occupancy probability of the second process is modulated (increased or decreased) by a node being inside or outside of the GCC created by the first process. In the context of epidemic spreading this amounts to both an antagonistic partial immunity and a synergistic partial coinfection interaction between the two sequential diseases.
View Article and Find Full Text PDFWe present exact solutions for the size of the giant connected component of complex networks composed of cliques following bond percolation. We use our theoretical result to find the location of the percolation threshold of the model, providing analytical solutions where possible. We expect the results derived here to be useful to a wide variety of applications including graph theory, epidemiology, percolation, and lattice gas models, as well as fragmentation theory.
View Article and Find Full Text PDFNetworks provide a mathematically rich framework to represent social contacts sufficient for the transmission of disease. Social networks are often highly clustered and fail to be locally treelike. In this paper, we study the effects of clustering on the spread of sequential strains of a pathogen using the generating function formulation under a complete cross-immunity coupling, deriving conditions for the threshold of coexistence of the second strain.
View Article and Find Full Text PDFCoinfection is the process by which a host that is infected with a pathogen becomes infected by a second pathogen at a later point in time. An immunosuppressant host response to a primary disease can facilitate spreading of a subsequent emergent pathogen among the population. Social contact patterns within the substrate populace can be modeled using complex networks and it has been shown that contact patterns vastly influence the emergent disease dynamics.
View Article and Find Full Text PDFWe demonstrate that physics-based calculations of intrinsic aqueous solubility can rival cheminformatics-based machine learning predictions. A proof-of-concept was developed for a physics-based approach via a sublimation thermodynamic cycle, building upon previous work that relied upon several thermodynamic approximations, notably the 2 approximation, and limited conformational sampling. Here, we apply improvements to our sublimation free-energy model with the use of crystal phonon mode calculations to capture the contributions of the vibrational modes of the crystal.
View Article and Find Full Text PDFThe structure of many real networks is not locally treelike and, hence, network analysis fails to characterize their bond percolation properties. In a recent paper [P. Mann, V.
View Article and Find Full Text PDFPercolation theory can be used to describe the structural properties of complex networks using the generating function formulation. This mapping assumes that the network is locally treelike and does not contain short-range loops between neighbors. In this paper we use the generating function formulation to examine clustered networks that contain simple cycles and cliques of any order.
View Article and Find Full Text PDFWe describe three machine learning models submitted to the 2019 Solubility Challenge. All are founded on tree-like classifiers, with one model being based on Random Forest and another on the related Extra Trees algorithm. The third model is a consensus predictor combining the former two with a Bagging classifier.
View Article and Find Full Text PDFIn this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset.
View Article and Find Full Text PDFBackground: Computer-Aided Drug Design has strongly accelerated the development of novel antineoplastic agents by helping in the hit identification, optimization, and evaluation.
Results: Computational approaches such as cheminformatic search, virtual screening, pharmacophore modeling, molecular docking and dynamics have been developed and applied to explain the activity of bioactive molecules, design novel agents, increase the success rate of drug research, and decrease the total costs of drug discovery. Similarity, searches and virtual screening are used to identify molecules with an increased probability to interact with drug targets of interest, while the other computational approaches are applied for the design and evaluation of molecules with enhanced activity and improved safety profile.
We compare a range of computational methods for the prediction of sublimation thermodynamics (enthalpy, entropy, and free energy of sublimation). These include a model from theoretical chemistry that utilizes crystal lattice energy minimization (with the DMACRYS program) and quantitative structure property relationship (QSPR) models generated by both machine learning (random forest and support vector machines) and regression (partial least squares) methods. Using these methods we investigate the predictability of the enthalpy, entropy and free energy of sublimation, with consideration of whether such a method may be able to improve solubility prediction schemes.
View Article and Find Full Text PDFWe created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites.
View Article and Find Full Text PDF