Treating Semiempirical Hamiltonians as Flexible Machine Learning Models Yields Accurate and Interpretable Results.

J Chem Theory Comput

Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.

Published: September 2023

Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of data needed to train the model compared to that required for deep learning models.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10536991	PMC
http://dx.doi.org/10.1021/acs.jctc.3c00491	DOI Listing

Publication Analysis

Top Keywords

learning models

deep learning

machine learning

computational cost

high accuracy

learn large

large volumes

volumes data

seqc models

sacrificing interpretability

Similar Publications

OSBPL3 modulates the immunosuppressive microenvironment and predicts therapeutic outcomes in pancreatic cancer.

Biol Direct

January 2025

School of Medicine, South China University of Technology, Guangzhou, 510006, China.

Qihui Sun Xiaoqi Zhu Qi Zou Yang Chen Tingting Wen

Background: Pancreatic cancer is characterized by a complex tumor microenvironment that hinders effective immunotherapy. Identifying key factors that regulate the immunosuppressive landscape is crucial for improving treatment strategies.

Methods: We constructed a prognostic and risk assessment model for pancreatic cancer using 101 machine learning algorithms, identifying OSBPL3 as a key gene associated with disease progression and prognosis.

View Article and Find Full Text PDF

Similar Publications

Prediction of urinary tract infection using machine learning methods: a study for finding the most-informative variables.

BMC Med Inform Decis Mak

January 2025

Department of Pediatrics, School of Medicine, Ekbatan Hospital, Hamadan University of Medical Sciences, Hamadan, Iran.

Sajjad Farashi Hossein Emad Momtaz

Background: Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urine culture which is a time-consuming and also an error prone method.

View Article and Find Full Text PDF

Similar Publications

A machine learning model accurately identifies glycogen storage disease Ia patients based on plasma acylcarnitine profiles.

Orphanet J Rare Dis

January 2025

Laboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of Groningen, Hanzeplein 1, Postbus, Groningen, 30001 - 9700 RB, the Netherlands.

Joost Groen Bas M de Haan Ruben J Overduin Andrea B Haijer-Schreuder Terry Gj Derks

Background: Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation.

View Article and Find Full Text PDF

Similar Publications

Development and external validation of a machine learning model for brain injury in pediatric patients on extracorporeal membrane oxygenation.

Crit Care

January 2025

Department of Pediatric, West China Second University Hospital, Sichuan University, Chengdu, China.

Bixin Deng Zhe Zhao Tiechao Ruan Ruixi Zhou Chang'e Liu

Background: Patients supported by extracorporeal membrane oxygenation (ECMO) are at a high risk of brain injury, contributing to significant morbidity and mortality. This study aimed to employ machine learning (ML) techniques to predict brain injury in pediatric patients ECMO and identify key variables for future research.

Methods: Data from pediatric patients undergoing ECMO were collected from the Chinese Society of Extracorporeal Life Support (CSECLS) registry database and local hospitals.

View Article and Find Full Text PDF

Similar Publications

Explainable unsupervised anomaly detection for healthcare insurance data.

BMC Med Inform Decis Mak

January 2025

Department of Electrical Engineering, ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.

Hannes De Meulemeester Frank De Smet Johan van Dorst Elise Derroitte Bart De Moor

Background: Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!