Publications by authors named "Ramon Alain Miranda Quintana"

We are transforming Radial Threshold Clustering (RTC), an algorithm, into Extended Quality Clustering, an algorithm with several novel features. Daura et al's RTC algorithm is a partitioning clustering algorithm that groups similar frames together based on their similarity to the seed configuration. Two current issues with RTC is that it scales as making it inefficient at high frame counts, and the clustering results are dependent on the order of the input frames.

View Article and Find Full Text PDF

The average and variance of the molecular similarities in a set is high-value and useful information for cheminformatics tasks like chemical space exploration and subset selection. However, the calculation of the variance of the complete similarity matrix has a quadratic complexity, ( ). As the sizes of molecular libraries constantly increase, this pairwise approach is unfeasible.

View Article and Find Full Text PDF

Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation and superiority in tasks like diversity picking.

View Article and Find Full Text PDF

The presence of Activity Cliffs (ACs) has been known to represent a challenge for QSAR modeling. With its high data dependency, Machine Learning QSAR models will be directly influenced by the activity landscape. We propose several extended similarity and extended SALI methods to study the implications of ACs distribution on the training and test sets on the model's errors.

View Article and Find Full Text PDF

PyCI is a free and open-source Python library for setting up and running arbitrary determinant-driven configuration interaction (CI) computations, as well as their generalizations to cases where the coefficients of the determinant are nonlinear functions of optimizable parameters. PyCI also includes functionality for computing the residual correlation energy, along with the ability to compute spin-polarized one- and two-electron (transition) reduced density matrices. PyCI was originally intended to replace the ab initio quantum chemistry functionality in the HORTON library but emerged as a standalone research tool, primarily intended to aid in method development, while maintaining high performance so that it is suitable for practical calculations.

View Article and Find Full Text PDF

The widespread use of Machine Learning (ML) techniques in chemical applications has come with the pressing need to analyze extremely large molecular libraries. In particular, clustering remains one of the most common tools to dissect the chemical space. Unfortunately, most current approaches present unfavorable time and memory scaling, which makes them unsuitable to handle million- and billion-sized sets.

View Article and Find Full Text PDF

Molecular dynamics (MD) simulations are ideally suited to describe conformational ensembles of biomolecules such as proteins and nucleic acids. Microsecond-long simulations are now routine, facilitated by the emergence of graphical processing units. Clustering, which groups objects based on structural similarity, is typically used to process ensembles, leading to different states, their populations, and the identification of representative structures.

View Article and Find Full Text PDF

One of the key challenges of -means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as -means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex data sets such as those obtained from molecular simulation, -means++ fails to partition the data in an optimal manner.

View Article and Find Full Text PDF

The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources.

View Article and Find Full Text PDF

We propose a new perturbation theory framework that can be used to help with the projective solution of the Schrödinger equation for arbitrary wave functions. This Flexible Ansatz for -body Perturbation Theory (FANPT) is based on our previously proposed Flexible Ansatz for the -body Configuration Interaction (FANCI). We derive recursive FANPT expressions, including arbitrary orders in the perturbation hierarchy.

View Article and Find Full Text PDF

Imaging mass spectrometry is a label-free imaging modality that allows for the spatial mapping of many compounds directly in tissues. In an imaging mass spectrometry experiment, a raster of the tissue surface produces a mass spectrum at each sampled , position, resulting in thousands of individual mass spectra, each comprising a pixel in the resulting ion images. However, efficient analysis of imaging mass spectrometry datasets can be challenging due to the hyperspectral characteristics of the data.

View Article and Find Full Text PDF

Electron pairs have an illustrious history in chemistry, from powerful concepts to understanding structural stability and reactive changes to the promise of serving as building blocks of quantitative descriptions of the electronic structure of complex molecules and materials. However, traditionally, two-electron wavefunctions (geminals) have not enjoyed the popularity and widespread use of the more standard single-particle methods. This has changed recently, with a renewed interest in the development of geminal wavefunctions as an alternative to describing strongly correlated phenomena.

View Article and Find Full Text PDF

One of the key challenges of -means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as -means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation, -means++ fails to partition the data in an optimal manner.

View Article and Find Full Text PDF

We introduce certain concepts and expressions from conceptual density functional theory (DFT) to study the properties of the Hildebrand solubility parameter. The original form of the Hildebrand solubility parameter is used to qualitatively estimate solubilities for various apolar and aprotic substances and solvents and is based on the square root of the cohesive energy density. Our results show that a revised expression allows the replacement of cohesive energy densities by electrophilicity densities, which are numerically accessible by simple DFT calculations.

View Article and Find Full Text PDF

Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space.

View Article and Find Full Text PDF

Imaging mass spectrometry is a label-free imaging modality that allows for the spatial mapping of many compounds directly in tissues. In an imaging mass spectrometry experiment, a raster of the tissue surface produces a mass spectrum at each sampled , position, resulting in thousands of individual mass spectra, each comprising a pixel in the resulting ion images. However, efficient analysis of imaging mass spectrometry datasets can be challenging due to the hyperspectral characteristics of the data.

View Article and Find Full Text PDF

Understanding structure-activity landscapes is essential in drug discovery. Similarly, it has been shown that the presence of activity cliffs in compound data sets can have a substantial impact not only on the design progress but also can influence the predictive ability of machine learning models. With the continued expansion of the chemical space and the currently available large and ultra-large libraries, it is imperative to implement efficient tools to analyze the activity landscape of compound data sets rapidly.

View Article and Find Full Text PDF

The hard/soft acid/base (HSAB) principle is a cornerstone in our understanding of chemical reactivity preferences. Motivated by the success of the original ("global") version of this rule, a "local" counterpart was readily proposed to account for regioselectivity preferences, in particular, in ambident reactions. However, ample experimental evidence indicates that the local HSAB principle often fails to provide meaningful predictions.

View Article and Find Full Text PDF

We present a first-principles approach for the calculation of solvation energies and enthalpies with respect to different ion pair combinations in various solvents. The method relies on the conceptual density functional theory (DFT) of solvation, from which detailed expressions for the solvation energies can be derived. In addition to fast and straightforward gas phase calculations, we also study the influence of modified chemical reactivity descriptors in terms of electronic perturbations.

View Article and Find Full Text PDF

We report the main conclusions of the first Chemoinformatics and Artificial Intelligence Colloquium, Mexico City, June 15-17, 2022. Fifteen lectures were presented during a virtual public event with speakers from industry, academia, and non-for-profit organizations. Twelve hundred and ninety students and academics from more than 60 countries.

View Article and Find Full Text PDF

Fanpy is a free and open-source Python library for developing and testing multideterminant wavefunctions and related ab initio methods in electronic structure theory. The main use of Fanpy is to quickly prototype new methods by making it easier to convert the mathematical formulation of a new wavefunction ansätze to a working implementation. Fanpy is designed based on our recently introduced Flexible Ansatz for N-electron Configuration Interaction (FANCI) framework, where multideterminant wavefunctions are represented by their overlaps with Slater determinants of orthonormal spin-orbitals.

View Article and Find Full Text PDF

We present explainable machine learning approaches for the accurate prediction and understanding of solvation free energies, enthalpies, and entropies for different salts in various protic and aprotic solvents. As key input features, we use fundamental contributions from the conceptual density functional theory (DFT) of solutions. The most accurate models with the highest prediction accuracy for the experimental validation data set are decision tree-based approaches such as extreme gradient boosting and extra trees, which highlight the non-linear influence of feature values on target predictions.

View Article and Find Full Text PDF

We show that the "|Δμ| big is good" principle holds at temperatures above absolute zero (the so-called "finite-T regime"). We also provide the first conditions hinting at the validity of this reactivity rule in cases where the chemical reactions involved have different signs in their chemical potential variations.

View Article and Find Full Text PDF

We demonstrate the utility of basic chemical principles like the "|Δμ| big is good" (DMB) rule for the study of solvation interactions between distinct solutes such as ions and solvents. The corresponding approach allows us to define relevant criteria for maximum solvation energies of ion pairs in different solvents in terms of electronegativities and chemical hardnesses. Our findings reveal that the DMB principle culminates into the strong and weak acids and bases concept as recently derived for specific ion effects in various solvents.

View Article and Find Full Text PDF

We present a new classification scheme for amino acids and nucleobases based on the electronic properties of the individual molecules. Using chemical reactivity indices such as electronegativity, electrophilicity, and chemical hardness, we can identify similarities and differences between each class of amino acids and nucleobases. Notable differences emerge in particular with regard to high, neutral or low electronegativity as well as different combinations of chemical hardness.

View Article and Find Full Text PDF