Large-scale distributed linear algebra with tensor processing units.

Adam G M Lewis Jackson Beall Martin Ganahl Markus Hauru Shrestha Basu Mallick Guifre Vidal

Proc Natl Acad Sci U S A

Sandbox Alphabet X, The Moonshot Factory, Mountain View, CA 94043.

Published: August 2022

We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs' fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size [Formula: see text] in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9388123	PMC
http://dx.doi.org/10.1073/pnas.2122762119	DOI Listing

Publication Analysis

Top Keywords

linear algebra

tensor processing

processing units

dense linear

linear

large-scale distributed

distributed linear

algebra tensor

units repurposed

repurposed google

Similar Publications

A Secure and Efficient White-Box Implementation of SM4.

Entropy (Basel)

December 2024

School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China.

Xiaobo Hu Yanyan Yu Yinzi Tu Jing Wang Shi Chen

Differential Computation Analysis (DCA) leverages memory traces to extract secret keys, bypassing countermeasures employed in white-box designs, such as encodings. Although researchers have made great efforts to enhance security against DCA, most solutions considerably decrease algorithmic efficiency. In our approach, the Feistel cipher SM4 is implemented by a series of table-lookup operations, and the input and output of each table are protected by affine transformations and nonlinear encodings generated randomly.

View Article and Find Full Text PDF

Similar Publications

Programmable wave-based analog computing machine: a metastructure that designs metastructures.

Nat Commun

January 2025

Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA.

Dimitrios C Tzarouchis Brian Edwards Nader Engheta

The ability to perform mathematical computations using metastructures is an emergent paradigm that carries the potential of wave-based analog computing to the realm of near-speed-of-light, low-loss, compact devices. We theoretically introduce and experimentally verify the concept of a reconfigurable metastructure that performs analog complex mathematical computations using electromagnetic waves. Reconfigurable, RF-based components endow our device with the ability to perform stationary and non-stationary iterative algorithms.

View Article and Find Full Text PDF

Similar Publications

Implementing the discontinuous-Galerkin finite element method using graph neural networks with application to diffusion equations.

Neural Netw

December 2024

Department of Earth Science and Engineering, Imperial College London, Prince Consort Road, London SW7 2BP, UK; Centre for AI-Physics Modelling, Imperial-X, White City Campus, Imperial College London, W12 7SL, UK.

Linfeng Li Jiansheng Xiang Boyang Chen Claire E Heaney Steven Dargaville

Machine learning (ML) has benefited from both software and hardware advancements, leading to increasing interest in capitalising on ML throughout academia and industry. There have been efforts in the scientific computing community to leverage this development via implementing conventional partial differential equation (PDE) solvers with machine learning packages, most of which rely on structured spatial discretisation and fast convolution algorithms. However, unstructured meshes are favoured in problems with complex geometries.

View Article and Find Full Text PDF

Similar Publications

A combinatory approach of non-chain ring and henon map for image encryption application.

Sci Rep

January 2025

Department of Mathematics, College of Science, King Khalid, University, Abha, 61413, Saudi Arabia.

Salman Mohi Ud Din Tariq Shah Fahad Alblehai Sameer Nooh

Algebraic structures play a vital role in securing important data. These structures are utilized to construct the non-linear components of block ciphers. Since constructing non-linear components through algebraic structures is crucial for the confusion aspects of encryption schemes, relying solely on these structures can result in limited key spaces.

View Article and Find Full Text PDF

Similar Publications

Expansion of stereotactic work envelope using transformation matrices and geometric algebra for neurosurgery.

Biomed Eng Lett

January 2025

NaviNetics, Inc, Rochester, MN USA.

Basel Sharaf Seth Lewis David Choung Abhinav Goyal Kristen M Scheitler

Stereotactic systems have traditionally used Cartesian coordinate combined with linear algebraic mathematical models to navigate the brain. Previously, the development of a novel stereotactic system allowed for improved patient comfort, reduced size, and carried through a simplified interface for surgeons. The system was designed with a work envelope and trajectory range optimized for deep brain stimulation applications only.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!