Accelerated linear algebra for large scale DFT calculations of materials on CPU/GPU architectures with CRYSTAL.

Giacomo Ambrogio Lorenzo Donà Jacques K Desmarais Chiara Ribaldone Silvia Casassa Filippo Spiga Bartolomeo Civalleri Alessandro Erba

J Chem Phys

Dipartimento di Chimica, Università di Torino, via Giuria 5, 10125 Torino, Italy.

Published: February 2025

We discuss the implementation strategy, numerical accuracy, and computational performance of the acceleration of linear algebra operations through graphics processing units (GPUs) for the self-consistent field driver of the Crystal electronic structure package for solid state density functional theory simulations. Accelerated tasks include matrix multiplication, diagonalization, and inversion, as well as Cholesky decomposition. The scaling of the implemented strategy over multiple accelerating devices is assessed in the range of 1-8 GPUs per node and found to be remarkably regular. Tests are performed on three systems: α-quartz, a microporous zeolitic imidazolate framework (ZIF-8), and a giant mesoporous metal-organic framework (bio-MOF). Scaling with system size is investigated via supercells of increasing size of both α-quartz and ZIF-8 (up to 648 and 2208 atoms per cell, respectively). The bio-MOF model structure has 2808 atoms per cell, with 33 672 basis functions. We test the performance of the accelerated code with both generalized gradient approximation (GGA) and hybrid GGA exchange-correlation functionals. The efficiency of the new accelerated code is compared to the previous central processing unit (CPU)-only parallelization strategies based on MPI or MPI/OpenMP within either replicated or distributed memory (i.e., massively parallel) approaches. Such a comparison highlights how the new GPU-accelerated code enables calculations on large systems at a significantly reduced computational cost relative to CPU-only strategies. For instance, we find that for the bio-MOF system, the computing time of the linear algebra tasks from a single GPU is comparable to that from the reference approach in the range of 512-1024 CPU cores and 4-8 nodes.

Download full-text PDF	Source
http://dx.doi.org/10.1063/5.0250793	DOI Listing

Publication Analysis

Top Keywords

linear algebra

atoms cell

accelerated code

accelerated

accelerated linear

algebra large

large scale

scale dft

dft calculations

calculations materials

Similar Publications

General rogue waves and modulation instability of the generalized coupled nonlinear Schrödinger system in optical pulses.

Chaos

March 2025

School of Mathematical Sciences, Zhejiang University of Technology, Hangzhou 310023, People's Republic of China.

Haifang Song Bo Ren

We focus on rogue waves and modulation instability (MI) of the generalized coupled nonlinear Schrödinger (GCNLS) system in optical pulses. Through the Kadomtsev-Petviashvili hierarchy reduction method, general high-order rogue wave solutions in Gram determinant form at p=p0 are constructed, which contain derivative operators with respect to parameters p and q. We reduce solutions to purely algebraic expressions with the aid of the elementary Schur polynomials.

View Article and Find Full Text PDF

Similar Publications

On the Upper Bounds of Number of Linear Regions and Generalization Error of Deep Convolutional Neural Networks.

IEEE Trans Pattern Anal Mach Intell

March 2025

Degang Chen Jiayu Liu Xiaoya Che

Understanding the effect of hyperparameters of the network structure on the performance of Convolutional Neural Networks (CNNs) remains the most fundamental and urgent issue in deep learning, and we attempt to address this issue based on the piecewise linear (PWL) function nature of CNNs in this paper. Firstly, the operations of convolutions, ReLUs and Max pooling in a CNN are represented as the multiplication of multiple matrices for a fixed sample in order to obtain an algebraic expression of CNNs, this expression clearly suggests that CNNs are PWL functions. Although such representation has high time complexity, it provides a more convenient and intuitive way to study the mathematical properties of CNNs.

View Article and Find Full Text PDF

Similar Publications

Efficient Signed Graph Sampling via Balancing & Gershgorin Disc Perfect Alignment.

IEEE Trans Pattern Anal Mach Intell

April 2025

Chinthaka Dinesh Gene Cheung Saghar Bagheri Ivan V Bajic

A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is leveraged for graph filtering. Existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. However, there are many real-world datasets exhibiting strong anti-correlations, and thus a suitable model is a signed graph, containing both positive and negative edge weights.

View Article and Find Full Text PDF

Similar Publications

Mechanical Power in Pressure-Controlled Ventilation: A Simple and Reliable Bedside Method.

Crit Care Explor

March 2025

All authors: Department of Intensive Care, Leiden University Medical Center, Leiden, The Netherlands.

Jacob W M Snoep Petra J Rietveld Franciska van der Velde-Quist Evert de Jonge Abraham Schoe

Background: Mechanical power (MP) represents the amount of energy applied by the ventilator to the respiratory system over time. There are two main methods to calculate MP in mechanical ventilation. The first is the geometric method, which directly measures the dynamic inspiratory area of the pressure-volume loop during the respiratory cycle.

View Article and Find Full Text PDF

Similar Publications

Cubic non-polynomial spline on piecewise mesh for singularly perturbed reaction differential equations with robin type boundary conditions.

BMC Res Notes

February 2025

Department of Mathematics, Jimma University, Jimma, Ethiopia.

Bethelhem Esayas Ayele Tesfaye Aga Bullo Gemechis File Duressa

Objective: The main purpose of this work is to present cubic non-polynomial spline approximation method for solving Robin-type singularly perturbed reaction-diffusion problems.

Results: The solution domain is first discretized using a piecewise mesh. The process begins by defining the cubic non-polynomial spline function and calculating its derivatives.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!