Arithmetic with language models: From memorization to computation.

Neural Netw

Department of Computer Science and Engineering, University of Bologna, Italy. Electronic address:

Published: November 2024

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypothesis that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2024.106550DOI Listing

Publication Analysis

Top Keywords

language model
12
language models
8
arithmetic language
4
models memorization
4
memorization computation
4
computation better
4
better understanding
4
understanding emergent
4
emergent computation
4
computation problem-solving
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!