We examined pretraining tasks leveraging abundant labeled data to effectively enhance molecular representation learning in downstream tasks, specifically emphasizing graph transformers to improve the prediction of ADMET properties. Our investigation revealed limitations in previous pretraining tasks and identified more meaningful training targets, ranging from 2D molecular descriptors to extensive quantum chemistry simulations. These data were seamlessly integrated into supervised pretraining tasks.
View Article and Find Full Text PDFTo effectively delineate the spatial distribution of oil contaminant plumes, geophysical methods indirectly measure the physical properties of the subsurface and can provide spatial information and images on a large scale, as opposed to traditional direct methods such as borehole drilling, sampling, and chemical analysis, which are time-consuming and costly. However, interpreting geophysical responses over non-aqueous phase liquid (NAPL)-contaminated sites is not straightforward due to inconsistent responses from biodegraded oil contaminants. In this study, we performed multi-geophysical surveys including seismic refraction, ground-penetrating radar, electrical resistivity tomography (ERT), and induced polarization (IP) surveys, to locate NAPL-contaminated zones in a clay-rich site.
View Article and Find Full Text PDFMotivation: Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures.
Results: Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields.
Conformational space annealing (CSA), a global optimization method, has been applied to various protein structure modeling tasks. In this paper, we applied CSA to the cryo-EM structure modeling task by combining the python subroutine of CSA (PyCSA) and the fast relax (FastRelax) protocol of PyRosetta. Refinement of initial structures generated from two methods, rigid fitting of predicted structures to the Cryo-EM map and de novo protein modeling by tracing the Cryo-EM map, was performed by CSA.
View Article and Find Full Text PDFCurr Opin Struct Biol
April 2023
Drug discovery aims to select proper targets and drug candidates to address unmet clinical needs. The end-to-end drug discovery process includes all stages of drug discovery from target identification to drug candidate selection. Recently, several artificial intelligence and machine learning (AI/ML)-based drug discovery companies have attempted to build data-driven platforms spanning the end-to-end drug discovery process.
View Article and Find Full Text PDFThe computational atomistic description of the folding reactions of the B1 domains, GB1 and LB1, of protein G and protein L, respectively, is an important challenge in current protein folding studies. Although the two proteins have overall very similar backbone structures (β-hairpin-α-helix-β-hairpin), their apparent folding behaviors observed experimentally were remarkably different. LB1 folds in a two-state manner with the single-exponential kinetics, whereas GB1 folds in a more complex manner with an early stage intermediate that may exist on the folding pathway.
View Article and Find Full Text PDFThe general theory of the construction of scale-consistent energy terms in the coarse-grained force fields presented in Paper I of this series has been applied to the revision of the UNRES force field for physics-based simulations of proteins. The potentials of mean force corresponding to backbone-local and backbone-correlation energy terms were calculated from the ab initio energy surfaces of terminally blocked glycine, alanine, and proline, and the respective analytical expressions, derived by using the scale-consistent formalism, were fitted to them. The parameters of all these potentials depend on single-residue types, thus reducing their number and preventing over-fitting.
View Article and Find Full Text PDFProtein structure alignment is an important tool for studying evolutionary biology and protein modeling. A tool which intensively searches for the globally optimal non-sequential alignments is rarely found. We propose ALIGN-CSA which shows improvement in scores, such as DALI-score, SP-score, SO-score and TM-score over the benchmark set including 286 cases.
View Article and Find Full Text PDFCharacterizing glycans and glycoconjugates in the context of three-dimensional structures is important in understanding their biological roles and developing efficient therapeutic agents. Computational modeling and molecular simulation have become an essential tool complementary to experimental methods. Here, we present a computational tool, Glycan Modeler for in silico N-/O-glycosylation of the target protein and generation of carbohydrate-only systems.
View Article and Find Full Text PDFAucubin is a small compound naturally found in traditional medicinal herbs with primarily anti-inflammatory and protective effects. In the nervous system, aucubin is reported to be neuroprotective by enhancing neuronal survival and inhibiting apoptotic cell death in cultures and disease models. Our previous data, however, suggest that aucubin facilitates neurite elongation in cultured hippocampal neurons and axonal regrowth in regenerating sciatic nerves.
View Article and Find Full Text PDFIn CASP12, 2 types of data-assisted protein structure modeling were experimented. Either SAXS experimental data or cross-linking experimental data was provided for a selected number of CASP12 targets that the CASP12 predictor could utilize for better protein structure modeling. We devised 2 separate energy terms for SAXS data and cross-linking data to drive the model structures into more native-like structures that satisfied the given experimental data as much as possible.
View Article and Find Full Text PDFFor protein structure modeling in the CASP12 experiment, we have developed a new protocol based on our previous CASP11 approach. The global optimization method of conformational space annealing (CSA) was applied to 3 stages of modeling: multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain re-modeling. For better template selection and model selection, we updated our model quality assessment (QA) method with the newly developed SVMQA (support vector machine for quality assessment).
View Article and Find Full Text PDFImproving the quality of a given protein structure can serve as the ultimate solution for accurate protein structure prediction, and seeking such a method is currently a challenge in computational structural biology. In order to promote and encourage much needed such efforts, CASP (Critical Assessment of Structure Prediction) has been providing an ideal computational experimental platform, where it was reported only recently (since CASP10) that systematic protein structure refinement is possible by carrying out extensive (approximately millisecond) MD simulations with proper restraints generated from the given structure. Using an explicit solvent model and much reduced positional and distance restraints than previously exercised, we propose a refinement protocol that combines a series of short (5 ns) MD simulations with energy minimization procedures.
View Article and Find Full Text PDFGlobal searching for reaction pathways is a long-standing challenge in computational chemistry and biology. Most existing approaches perform only local searches due to computational complexity. Here we present a computational approach, Action-CSA, to find multiple diverse reaction pathways connecting fixed initial and final states through global optimization of the Onsager-Machlup action using the conformational space annealing (CSA) method.
View Article and Find Full Text PDFWe have applied the conformational space annealing method to the contact-assisted protein structure modeling in CASP11. For Tp targets, where predicted residue-residue contact information was provided, the contact energy term in the form of the Lorentzian function was implemented together with the physical energy terms used in our template-free modeling of proteins. Although we observed some structural improvement of Tp models over the models predicted without the Tp information, the improvement was not substantial on average.
View Article and Find Full Text PDFFor the template-free modeling of human targets of CASP11, we utilized two of our modeling protocols, LEE and LEER. The LEE protocol took CASP11-released server models as the input and used some of them as templates for 3D (three-dimensional) modeling. The template selection procedure was based on the clustering of the server models aided by a community detection method of a server-model network.
View Article and Find Full Text PDFWe have carried out numerical experiments to investigate the applicability of the global optimization method of conformational space annealing (CSA) to the enhanced NMR protein structure determination over existing PDB structures. The NMR protein structure determination is driven by the optimization of collective multiple restraints arising from experimental data and the basic stereochemical properties of a protein-like molecule. By rigorous and straightforward application of CSA to the identical NMR experimental data used to generate existing PDB structures, we redetermined 56 recent PDB protein structures starting from fully randomized structures.
View Article and Find Full Text PDFFor the template-based modeling (TBM) of CASP11 targets, we have developed three new protein modeling protocols (nns for server prediction and LEE and LEER for human prediction) by improving upon our previous CASP protocols (CASP7 through CASP10). We applied the powerful global optimization method of conformational space annealing to three stages of optimization, including multiple sequence-structure alignment, three-dimensional (3D) chain building, and side-chain remodeling. For more successful fold recognition, a new alignment method called CRFalign was developed.
View Article and Find Full Text PDFBackground: In template-based modeling when using a single template, inter-atomic distances of an unknown protein structure are assumed to be distributed by Gaussian probability density functions, whose center peaks are located at the distances between corresponding atoms in the template structure. The width of the Gaussian distribution, the variability of a spatial restraint, is closely related to the reliability of the restraint information extracted from a template, and it should be accurately estimated for successful template-based protein structure modeling.
Results: To predict the variability of the spatial restraints in template-based modeling, we have devised a prediction model, Sigma-RF, by using the random forest (RF) algorithm.
Using the dielectrically consistent reference interaction site model (DRISM) of molecular solvation, we have calculated structural and thermodynamic information of alkali-halide salts in aqueous solution, as a function of salt concentration. The impact of varying the closure relation used with DRISM is investigated using the partial series expansion of order-n (PSE-n) family of closures, which includes the commonly used hypernetted-chain equation (HNC) and Kovalenko-Hirata closures. Results are compared to explicit molecular dynamics (MD) simulations, using the same force fields, and to experiment.
View Article and Find Full Text PDFBiochem Biophys Res Commun
February 2013
Mesenchymal stem cells (MSCs) are effective vectors in delivering a gene of interest into degenerating brain. In ex vivo gene therapy, viability of transplanted MSCs is correlated with the extent of functional recovery. It has been reported that BDNF facilitates survival of MSCs but dividing MSCs do not express the BDNF receptor, TrkB.
View Article and Find Full Text PDFMolecular dynamics-based free energy calculations allow the determination of a variety of thermodynamic quantities from computer simulations of small molecules. Thermodynamic integration (TI) calculations can suffer from instabilities during the creation or annihilation of particles. This "singularity" problem can be addressed with "soft-core" potential functions which keep pairwise interaction energies finite for all configurations and provide smooth free energy curves.
View Article and Find Full Text PDFNeuregulin 1 (NRG1) and epidermal growth factor receptor (ErbB) signaling pathways control Schwann cells during axonal regeneration in an injured peripheral nervous system. We investigated whether a persistent supply of recombinant NRG1 to the injury site could improve axonal growth and recovery of sensory and motor functions in rats during nerve regeneration. We generated a recombinant adenovirus expressing a secreted form of EGF-like domain from Heregulinβ (sHRGβE-Ad).
View Article and Find Full Text PDFCoralyne is an alkaloid drug that binds homo-adenine DNA (and RNA) oligonucleotides more tightly than it does Watson-Crick DNA. Hud's laboratory has shown that poly(dA) in the presence of coralyne forms an anti-parallel duplex, however attempts to determine the structure by NMR spectroscopy and X-ray crystallography have been unsuccessful. Assuming adenine-adenine hydrogen bonding between the two poly(dA) strands, we constructed 40 hypothetical homo-(dA) anti-parallel duplexes and docked coralyne into the six most favorable duplex structures.
View Article and Find Full Text PDFThe dynamic and energetic properties of the alkali and halide ions were calculated using molecular dynamics (MD) and free energy simulations with various different water and ion force fields including our recently developed water-model-specific ion parameters. The properties calculated were activity coefficients, diffusion coefficients, residence times of atomic pairs, association constants, and solubility. Through calculation of these properties, we can assess the validity and range of applicability of the simple pair potential models and better understand their limitations.
View Article and Find Full Text PDF