Quantum chemistry provides chemists with invaluable information, but the high computational cost limits the size and type of systems that can be studied. Machine learning (ML) has emerged as a means to dramatically lower the cost while maintaining high accuracy. However, ML models often sacrifice interpretability by using components such as the artificial neural networks of deep learning that function as black boxes. These components impart the flexibility needed to learn from large volumes of data but make it difficult to gain insight into the physical or chemical basis for the predictions. Here, we demonstrate that semiempirical quantum chemical (SEQC) models can learn from large volumes of data without sacrificing interpretability. The SEQC model is that of density-functional-based tight binding (DFTB) with fixed atomic orbital energies and interactions that are one-dimensional functions of the interatomic distance. This model is trained to data in a manner that is analogous to that used to train deep learning models. Using benchmarks that reflect the accuracy of the training data, we show that the resulting model maintains a physically reasonable functional form while achieving an accuracy, relative to coupled cluster energies with a complete basis set extrapolation (CCSD(T)*/CBS), that is comparable to that of density functional theory (DFT). This suggests that trained SEQC models can achieve a low computational cost and high accuracy without sacrificing interpretability. Use of a physically motivated model form also substantially reduces the amount of data needed to train the model compared to that required for deep learning models.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10536991 | PMC |
http://dx.doi.org/10.1021/acs.jctc.3c00491 | DOI Listing |
Biol Direct
January 2025
School of Medicine, South China University of Technology, Guangzhou, 510006, China.
Background: Pancreatic cancer is characterized by a complex tumor microenvironment that hinders effective immunotherapy. Identifying key factors that regulate the immunosuppressive landscape is crucial for improving treatment strategies.
Methods: We constructed a prognostic and risk assessment model for pancreatic cancer using 101 machine learning algorithms, identifying OSBPL3 as a key gene associated with disease progression and prognosis.
BMC Med Inform Decis Mak
January 2025
Department of Pediatrics, School of Medicine, Ekbatan Hospital, Hamadan University of Medical Sciences, Hamadan, Iran.
Background: Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urine culture which is a time-consuming and also an error prone method.
View Article and Find Full Text PDFOrphanet J Rare Dis
January 2025
Laboratory of Metabolic Diseases, Department of Laboratory Medicine, University Medical Center Groningen, University of Groningen, Hanzeplein 1, Postbus, Groningen, 30001 - 9700 RB, the Netherlands.
Background: Glycogen storage disease (GSD) Ia is an ultra-rare inherited disorder of carbohydrate metabolism. Patients often present in the first months of life with fasting hypoketotic hypoglycemia and hepatomegaly. The diagnosis of GSD Ia relies on a combination of different biomarkers, mostly routine clinical chemical markers and subsequent genetic confirmation.
View Article and Find Full Text PDFCrit Care
January 2025
Department of Pediatric, West China Second University Hospital, Sichuan University, Chengdu, China.
Background: Patients supported by extracorporeal membrane oxygenation (ECMO) are at a high risk of brain injury, contributing to significant morbidity and mortality. This study aimed to employ machine learning (ML) techniques to predict brain injury in pediatric patients ECMO and identify key variables for future research.
Methods: Data from pediatric patients undergoing ECMO were collected from the Chinese Society of Extracorporeal Life Support (CSECLS) registry database and local hospitals.
BMC Med Inform Decis Mak
January 2025
Department of Electrical Engineering, ESAT-STADIUS, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium.
Background: Waste and fraud are important problems for health insurers to deal with. With the advent of big data, these insurers are looking more and more towards data mining and machine learning methods to help in detecting waste and fraud. However, labeled data is costly and difficult to acquire as it requires expert investigators and known care providers with atypical behavior.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!