Publications by authors named "Doaa Altarawy"

Gene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development-representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes-is one of the most challenging arenas for GRN prediction. In this work, we show that successful GRN predictions for a developmental network from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm.

View Article and Find Full Text PDF

Community efforts in the computational molecular sciences (CMS) are evolving toward modular, open, and interoperable interfaces that work with existing community codes to provide more functionality and composability than could be achieved with a single program. The Quantum Chemistry Common Driver and Databases (QCDB) project provides such capability through an application programming interface (API) that facilitates interoperability across multiple quantum chemistry software packages. In tandem with the Molecular Sciences Software Institute and their Quantum Chemistry Archive ecosystem, the unique functionalities of several CMS programs are integrated, including CFOUR, GAMESS, NWChem, OpenMM, Psi4, Qcore, TeraChem, and Turbomole, to provide common computational functions, i.

View Article and Find Full Text PDF

The Basis Set Exchange (BSE) has been a prominent fixture in the quantum chemistry community. First publicly available in 2007, it is recognized by both users and basis set creators as the de facto source for information related to basis sets. This popular resource has been rewritten, utilizing modern software design and best practices.

View Article and Find Full Text PDF

Adaptive quantum mechanics/molecular mechanics (QM/MM) approaches are able to treat systems with dynamic or nonlocalized active centers by allowing for on-the-fly reassignment of the QM region. Although these approaches have been in active development, the inaccessibility of current software has caused slow adoption and limited applications. Janus seeks to remedy the limitations of current software by providing a free and open-source Python library for adaptive methods that is modular and extensible.

View Article and Find Full Text PDF

We introduce a free and open-source software package (PES-Learn) which largely automates the process of producing high-quality machine learning models of molecular potential energy surfaces (PESs). PES-Learn incorporates a generalized framework for producing grid points across a PES that is compatible with most electronic structure theory software. The newly generated or externally supplied PES data can then be used to train and optimize neural network or Gaussian process models in a completely automated fashion.

View Article and Find Full Text PDF

Comparing fragment based molecular fingerprints of drug-like molecules is one of the most robust and frequently used approaches in computer-assisted drug discovery. Molprint2D, a popular atom environment (AE) descriptor, yielded the best enrichment of active compounds across a diverse set of targets in a recent large-scale study. We present here BCL::Mol2D descriptors that outperformed Molprint2D on nine PubChem datasets spanning a wide range of protein classes.

View Article and Find Full Text PDF

The field of computational molecular sciences (CMSs) has made innumerable contributions to the understanding of the molecular phenomena that underlie and control chemical processes, which is manifested in a large number of community software projects and codes. The CMS community is now poised to take the next transformative steps of better training in modern software design and engineering methods and tools, increasing interoperability through more systematic adoption of agreed upon standards and accepted best-practices, overcoming unnecessary redundancy in software effort along with greater reproducibility, and increasing the deployment of new software onto hardware platforms from in-house clusters to mid-range computing systems through to modern supercomputers. This in turn will have future impact on the software that will be created to address grand challenge science that we illustrate here: the formulation of diverse catalysts, descriptions of long-range charge and excitation transfer, and development of structural ensembles for intrinsically disordered proteins.

View Article and Find Full Text PDF

The increasing availability of chromatin immunoprecipitation sequencing (ChIP-Seq) data enables us to learn more about the action of transcription factors in the regulation of gene expression. Even though transcriptional regulation often involves the concerted action of more than one transcription factor, the format of each individual ChIP-Seq dataset usually represents the action of a single transcription factor. Therefore, a relational database in which available ChIP-Seq datasets are curated is essential.

View Article and Find Full Text PDF

With abundance of biological data, computational prediction of gene regulatory networks (GRNs) from gene expression data has become more feasible. Although incorporating other prior knowledge (PK), along with gene expression data, greatly improves prediction accuracy, the overall accuracy is still low. PK in GRN inference can be categorized into noisy and curated.

View Article and Find Full Text PDF