J Chem Theory Comput
January 2025
While machine learning (ML) models have been able to achieve unprecedented accuracies across various prediction tasks in quantum chemistry, it is now apparent that accuracy on a test set alone is not a guarantee for robust chemical modeling such as stable molecular dynamics (MD). To go beyond accuracy, we use explainable artificial intelligence (XAI) techniques to develop a general analysis framework for atomic interactions and apply it to the SchNet and PaiNN neural network models. We compare these interactions with a set of fundamental chemical principles to understand how well the models have learned the underlying physicochemical concepts from the data.
View Article and Find Full Text PDFWhen physical properties of molecules are being modeled with machine learning, it is desirable to incorporate (3)-covariance. While such models based on low body order features are not complete, we formulate and prove general completeness properties for higher order methods and show that 6 - 5 of these features are enough for up to atoms. We also find that the Clebsch-Gordan operations commonly used in these methods can be replaced by matrix multiplications without sacrificing completeness, lowering the scaling from () to () in the degree of the features.
View Article and Find Full Text PDFUnderstanding the evolution and dissemination of human knowledge over time faces challenges due to the abundance of historical materials and limited specialist resources. However, the digitization of historical archives presents an opportunity for AI-supported analysis. This study advances historical analysis by using an atomization-recomposition method that relies on unsupervised machine learning and explainable AI techniques.
View Article and Find Full Text PDFIntroduction: Molecular profiling of lung cancer is essential to identify genetic alterations that predict response to targeted therapy. While deep learning shows promise for predicting oncogenic mutations from whole tissue images, existing studies often face challenges such as limited sample sizes, a focus on earlier stage patients, and insufficient analysis of robustness and generalizability.
Methods: This retrospective study evaluates factors influencing mutation prediction accuracy using the large Heidelberg Lung Adenocarcinoma Cohort (HLCC), a cohort of 2356 late-stage FFPE samples.
Recent years have seen vast progress in the development of machine learned force fields (MLFFs) based on ab-initio reference calculations. Despite achieving low test errors, the reliability of MLFFs in molecular dynamics (MD) simulations is facing growing scrutiny due to concerns about instability over extended simulation timescales. Our findings suggest a potential connection between robustness to cumulative inaccuracies and the use of equivariant representations in MLFFs, but the computational cost associated with these representations can limit this advantage in practice.
View Article and Find Full Text PDFIEEE Trans Pattern Anal Mach Intell
November 2024
Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g.
View Article and Find Full Text PDFThe GEMS method enables molecular dynamics simulations of large heterogeneous systems at ab initio quality.
View Article and Find Full Text PDFWith the advancements in precision medicine, the demands on pathological diagnostics have increased, requiring standardized, quantitative, and integrated assessments of histomorphological and molecular pathological data. Great hopes are placed in artificial intelligence (AI) methods, which have demonstrated the ability to analyze complex clinical, histological, and molecular data for disease classification, biomarker quantification, and prognosis estimation. This paper provides an overview of the latest developments in pathology AI, discusses the limitations, particularly concerning the black box character of AI, and describes solutions to make decision processes more transparent using methods of so-called explainable AI (XAI).
View Article and Find Full Text PDFCounterfactuals can explain classification decisions of neural networks in a human interpretable way. We propose a simple but effective method to generate such counterfactuals. More specifically, we perform a suitable diffeomorphic coordinate transformation and then perform gradient ascent in these coordinates to find counterfactuals which are classified with great confidence as a specified target class.
View Article and Find Full Text PDFThe rapid development of precision medicine in recent years has started to challenge diagnostic pathology with respect to its ability to analyze histological images and increasingly large molecular profiling data in a quantitative, integrative, and standardized way. Artificial intelligence (AI) and, more precisely, deep learning technologies have recently demonstrated the potential to facilitate complex data analysis tasks, including clinical, histological, and molecular data for disease classification; tissue biomarker quantification; and clinical outcome prediction. This review provides a general introduction to AI and describes recent developments with a focus on applications in diagnostic pathology and beyond.
View Article and Find Full Text PDFIn recent years, the prediction of quantum mechanical observables with machine learning methods has become increasingly popular. Message-passing neural networks (MPNNs) solve this task by constructing atomic representations, from which the properties of interest are predicted. Here, we introduce a method to automatically identify chemical moieties (molecular building blocks) from such representations, enabling a variety of applications beyond property prediction, which otherwise rely on expert knowledge.
View Article and Find Full Text PDFDomain shifts in the training data are common in practical applications of machine learning; they occur for instance when the data is coming from different sources. Ideally, a ML model should work well independently of these shifts, for example, by learning a domain-invariant representation. However, common ML losses do not give strong guarantees on how consistently the ML model performs for different domains, in particular, whether the model performs well on a domain at the expense of its performance on another domain.
View Article and Find Full Text PDFEssential for understanding far-from-equilibrium processes, nonadiabatic (NA) molecular dynamics (MD) requires expensive calculations of the excitation energies and NA couplings. Machine learning (ML) can simplify computation; however, the NA Hamiltonian requires complex ML models due to its intricate relationship to atomic geometry. Working directly in the time domain, we employ bidirectional long short-term memory networks (Bi-LSTM) to interpolate the Hamiltonian.
View Article and Find Full Text PDFJ Phys Chem C Nanomater Interfaces
July 2023
A bold vision in nanofabrication is the assembly of functional molecular structures using a scanning probe microscope (SPM). This approach requires continuous monitoring of the molecular configuration during manipulation. Until now, this has been impossible because the SPM tip cannot simultaneously act as an actuator and an imaging probe.
View Article and Find Full Text PDFSingle-pulse electrical stimulation in the nervous system, often called cortico-cortical evoked potential (CCEP) measurement, is an important technique to understand how brain regions interact with one another. Voltages are measured from implanted electrodes in one brain area while stimulating another with brief current impulses separated by several seconds. Historically, researchers have tried to understand the significance of evoked voltage polyphasic deflections by visual inspection, but no general-purpose tool has emerged to understand their shapes or describe them mathematically.
View Article and Find Full Text PDFKernel machines have sustained continuous progress in the field of quantum chemistry. In particular, they have proven to be successful in the low-data regime of force field reconstruction. This is because many equivariances and invariances due to physical symmetries can be incorporated into the kernel function to compensate for much larger data sets.
View Article and Find Full Text PDFIn recent years, medical disciplines have moved closer together and rigid borders have been increasingly dissolved. The synergetic advantage of combining multiple disciplines is particularly important for radiology, nuclear medicine, and pathology to perform integrative diagnostics. In this review, we discuss how medical subdisciplines can be reintegrated in the future using state-of-the-art methods of digitization, data science, and machine learning.
View Article and Find Full Text PDFGlobal machine learning force fields, with the capacity to capture collective interactions in molecular systems, now scale up to a few dozen atoms due to considerable growth of model complexity with system size. For larger molecules, locality assumptions are introduced, with the consequence that nonlocal interactions are not described. Here, we develop an exact iterative approach to train global symmetric gradient domain machine learning (sGDML) force fields (FFs) for several hundred atoms, without resorting to any potentially uncontrolled approximations.
View Article and Find Full Text PDFThe molecular heterogeneity of cancer cells contributes to the often partial response to targeted therapies and relapse of disease due to the escape of resistant cell populations. While single-cell sequencing has started to improve our understanding of this heterogeneity, it offers a mostly descriptive view on cellular types and states. To obtain more functional insights, we propose scGeneRAI, an explainable deep learning approach that uses layer-wise relevance propagation (LRP) to infer gene regulatory networks from static single-cell RNA sequencing data for individual cells.
View Article and Find Full Text PDFNeuropathol Appl Neurobiol
February 2023
Aim: Analysis of cerebrospinal fluid (CSF) is essential for diagnostic workup of patients with neurological diseases and includes differential cell typing. The current gold standard is based on microscopic examination by specialised technicians and neuropathologists, which is time-consuming, labour-intensive and subjective.
Methods: We, therefore, developed an image analysis approach based on expert annotations of 123,181 digitised CSF objects from 78 patients corresponding to 15 clinically relevant categories and trained a multiclass convolutional neural network (CNN).
The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles.
View Article and Find Full Text PDFIEEE Trans Med Imaging
February 2024
We consider the reconstruction of brain activity from electroencephalography (EEG). This inverse problem can be formulated as a linear regression with independent Gaussian scale mixture priors for both the source and noise components. Crucial factors influencing the accuracy of the source estimation are not only the noise level but also its correlation structure, but existing approaches have not addressed the estimation of noise covariance matrices with full structure.
View Article and Find Full Text PDFHistological sections of the lymphatic system are usually the basis of static (2D) morphological investigations. Here, we performed a dynamic (4D) analysis of human reactive lymphoid tissue using confocal fluorescent laser microscopy in combination with machine learning. Based on tracks for T-cells (CD3), B-cells (CD20), follicular T-helper cells (PD1) and optical flow of follicular dendritic cells (CD35), we put forward the first quantitative analysis of movement-related and morphological parameters within human lymphoid tissue.
View Article and Find Full Text PDFReconstructing force fields (FFs) from atomistic simulation data is a challenge since accurate data can be highly expensive. Here, machine learning (ML) models can help to be data economic as they can be successfully constrained using the underlying symmetry and conservation laws of physics. However, so far, every descriptor newly proposed for an ML model has required a cumbersome and mathematically tedious remodeling.
View Article and Find Full Text PDF