We discuss the implementation strategy, numerical accuracy, and computational performance of the acceleration of linear algebra operations through graphics processing units (GPUs) for the self-consistent field driver of the Crystal electronic structure package for solid state density functional theory simulations. Accelerated tasks include matrix multiplication, diagonalization, and inversion, as well as Cholesky decomposition. The scaling of the implemented strategy over multiple accelerating devices is assessed in the range of 1-8 GPUs per node and found to be remarkably regular. Tests are performed on three systems: α-quartz, a microporous zeolitic imidazolate framework (ZIF-8), and a giant mesoporous metal-organic framework (bio-MOF). Scaling with system size is investigated via supercells of increasing size of both α-quartz and ZIF-8 (up to 648 and 2208 atoms per cell, respectively). The bio-MOF model structure has 2808 atoms per cell, with 33 672 basis functions. We test the performance of the accelerated code with both generalized gradient approximation (GGA) and hybrid GGA exchange-correlation functionals. The efficiency of the new accelerated code is compared to the previous central processing unit (CPU)-only parallelization strategies based on MPI or MPI/OpenMP within either replicated or distributed memory (i.e., massively parallel) approaches. Such a comparison highlights how the new GPU-accelerated code enables calculations on large systems at a significantly reduced computational cost relative to CPU-only strategies. For instance, we find that for the bio-MOF system, the computing time of the linear algebra tasks from a single GPU is comparable to that from the reference approach in the range of 512-1024 CPU cores and 4-8 nodes.

Download full-text PDF

Source
http://dx.doi.org/10.1063/5.0250793DOI Listing

Publication Analysis

Top Keywords

linear algebra
12
atoms cell
8
accelerated code
8
accelerated
4
accelerated linear
4
algebra large
4
large scale
4
scale dft
4
dft calculations
4
calculations materials
4

Similar Publications

We focus on rogue waves and modulation instability (MI) of the generalized coupled nonlinear Schrödinger (GCNLS) system in optical pulses. Through the Kadomtsev-Petviashvili hierarchy reduction method, general high-order rogue wave solutions in Gram determinant form at p=p0 are constructed, which contain derivative operators with respect to parameters p and q. We reduce solutions to purely algebraic expressions with the aid of the elementary Schur polynomials.

View Article and Find Full Text PDF

Understanding the effect of hyperparameters of the network structure on the performance of Convolutional Neural Networks (CNNs) remains the most fundamental and urgent issue in deep learning, and we attempt to address this issue based on the piecewise linear (PWL) function nature of CNNs in this paper. Firstly, the operations of convolutions, ReLUs and Max pooling in a CNN are represented as the multiplication of multiple matrices for a fixed sample in order to obtain an algebraic expression of CNNs, this expression clearly suggests that CNNs are PWL functions. Although such representation has high time complexity, it provides a more convenient and intuitive way to study the mathematical properties of CNNs.

View Article and Find Full Text PDF

A basic premise in graph signal processing (GSP) is that a graph encoding pairwise (anti-)correlations of the targeted signal as edge weights is leveraged for graph filtering. Existing fast graph sampling schemes are designed and tested only for positive graphs describing positive correlations. However, there are many real-world datasets exhibiting strong anti-correlations, and thus a suitable model is a signed graph, containing both positive and negative edge weights.

View Article and Find Full Text PDF

Background: Mechanical power (MP) represents the amount of energy applied by the ventilator to the respiratory system over time. There are two main methods to calculate MP in mechanical ventilation. The first is the geometric method, which directly measures the dynamic inspiratory area of the pressure-volume loop during the respiratory cycle.

View Article and Find Full Text PDF

Objective: The main purpose of this work is to present cubic non-polynomial spline approximation method for solving Robin-type singularly perturbed reaction-diffusion problems.

Results: The solution domain is first discretized using a piecewise mesh. The process begins by defining the cubic non-polynomial spline function and calculating its derivatives.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!