Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters.

J Chem Theory Comput

School of Computing and Information Systems, Melbourne University, Melbourne, VIC 3052, Australia.

Published: December 2024

This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev and Martinez [ 2009, 5, 1004-1015] and Barca et al. [ 2021, 17, 7486-7503]. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to -type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-ζ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-ζ basis sets of 1.4×, 8.4×, and 9.4× were observed compared to TeraChem, QUICK, and GPU4PySCF respectively. An increased average speedup of 2.1× over TeraChem is observed when using four A100 GPUs. Strong scaling analysis reveals over 91% parallel efficiency on four GPUs for opt-Brc, making it typically faster for multi-GPU execution. Single-compute-node comparisons with CPU-based software like ORCA and Q-Chem show speedups of up to 42× and 31×, respectively, enhancing power efficiency by up to 18×.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jctc.4c00994DOI Listing

Publication Analysis

Top Keywords

fock matrix
8
matrix construction
8
basis sets
8
a100 gpus
8
terachem quick
8
quick gpu4pyscf
8
orca q-chem
8
advanced techniques
4
techniques high-performance
4
high-performance fock
4

Similar Publications

Direct Givens rotation method based on error back-propagation algorithm for self-consistent field solution.

J Chem Phys

January 2025

Department of Chemistry and Biochemistry, School of Advanced Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan.

The self-consistent field (SCF) procedure is the standard technique for solving the Hartree-Fock and Kohn-Sham density functional theory calculations, while convergence is not theoretically guaranteed. Direct minimization methods, such as the augmented Lagrangian method (ALM) and second-order SCF (SOSCF), obtain the SCF solution by minimizing the Lagrangian with the gradient. In SOSCF, molecular orbitals are optimized by truncating the Taylor expansion of a unitary matrix represented in exponential form to ensure the orthonormality condition.

View Article and Find Full Text PDF

An automatic code generated C++/HIP/CUDA implementation of the (auxiliary) Fock, or Kohn-Sham, matrix construction for execution in GPU-accelerated hardware environments is presented. The module is developed as part of the quantum chemistry software package VeloxChem, employing localized Gaussian atomic orbitals. The performance and scaling characteristics are analyzed in view of the specific requirements for self-consistent field optimization and response theory calculations.

View Article and Find Full Text PDF

Machine Learning Mapping Approach for Computing Spin Relaxation Dynamics.

J Phys Chem Lett

December 2024

Department of Chemistry, University at Buffalo, The State University of New York, Buffalo, New York 14260, United States.

In this work, a machine learning mapping approach for predicting the properties of atomistic systems is reported. Within this approach, the atomic orbital overlap, density, or Kohn-Sham (KS) Fock matrix elements obtained at a low level of theory such as extended tight-binding have been used as input features to predict the electric field gradient (EFG) tensors at a higher level of theory such as those obtained with hybrid functionals. It is shown that the machine-learning-predicted EFG tensors can be used to compute spin relaxation rates of several ions in aqueous solutions.

View Article and Find Full Text PDF

The abundant demand for deep learning compute resources has created a renaissance in low-precision hardware. Going forward, it will be essential for simulation software to run on this new generation of machines without sacrificing scientific fidelity. In this paper, we examine the precision requirements of a representative kernel from quantum chemistry calculations: the calculation of the single-particle density matrix from a given mean-field Hamiltonian (i.

View Article and Find Full Text PDF

Thermal quasiparticle theory.

J Chem Phys

December 2024

Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA.

The widely used thermal Hartree-Fock (HF) theory is generalized to include the effect of electron correlation while maintaining its quasi-independent-particle framework. An electron-correlated internal energy (or grand potential) is postulated in consultation with the second-order finite-temperature many-body perturbation theory (MBPT), which then dictates the corresponding thermal orbital (quasiparticle) energies in such a way that all fundamental thermodynamic relations are obeyed. The associated density matrix is of a one-electron type, whose diagonal elements take the form of the Fermi-Dirac distribution functions, when the grand potential is minimized.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!