Large language models predict human sensory judgments across six modalities.

Raja Marjieh Ilia Sucholutsky Pol van Rijn Nori Jacoby Thomas L Griffiths

Sci Rep

Department of Psychology, Princeton University, Princeton, USA.

Published: September 2024

Determining the extent to which the perceptual world can be recovered from language is a longstanding problem in philosophy and cognitive science. We show that state-of-the-art large language models can unlock new insights into this problem by providing a lower bound on the amount of perceptual information that can be extracted from language. Specifically, we elicit pairwise similarity judgments from GPT models across six psychophysical datasets. We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral. Surprisingly, we find that a model (GPT-4) co-trained on vision and language does not necessarily lead to improvements specific to the visual modality, and provides highly correlated predictions with human data irrespective of whether direct visual input is provided or purely textual descriptors. To study the impact of specific languages, we also apply the models to a multilingual color-naming task. We find that GPT-4 replicates cross-linguistic variation in English and Russian illuminating the interaction of language and perception.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11399123	PMC
http://dx.doi.org/10.1038/s41598-024-72071-1	DOI Listing

Publication Analysis

Top Keywords

large language

language models

human data

language

models

models predict

predict human

human sensory

sensory judgments

judgments modalities

Similar Publications

Swallowing, speech and voice impairments in head and neck cancer patients treated at a multidisciplinary integrated patient unit.

Int J Lang Commun Disord

December 2024

Hearing, Speech & Language Center, Sheba Medical Center, Tel Hashomer, Israel.

Osnat Kandelshine-Waldman Omer Levy-Kardash Anat Hamburger Eran Alon Yael Henkin

Background: Head and neck cancer (HNC) is amongst the 10 most common cancers worldwide and has a major effect on patients' quality of life. Given the complexity of this unique group of patients, a multidisciplinary team approach is preferable. Amongst the debilitating sequels of HNC and/or its treatment, swallowing, speech and voice impairments are prevalent and require the involvement of speech-language pathologists (SLPs).

View Article and Find Full Text PDF

Similar Publications

Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

Behav Res Methods

December 2024

ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.

Marc Brysbaert Gonzalo Martínez Pedro Reviriego

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.

View Article and Find Full Text PDF

Similar Publications

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words.

Behav Res Methods

December 2024

Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.

Simona Amenta Andrea Gregor de Varda Pawel Mandera Emmanuel Keuleers Marc Brysbaert

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part.

View Article and Find Full Text PDF

Similar Publications

Evaluating ChatGPT's Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia.

J Nutr

December 2024

Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana, 010000, Kazakhstan. Electronic address:

Gulnoza Adilmetova Ruslan Nassyrov Aizhan Meyerbekova Aknur Karabay Huseyin Atakan Varol

Background: While large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific socio-cultural nuances and regional cuisines, such as those in Central Asia (e.g., Kazakhstan), still requires further investigation.

View Article and Find Full Text PDF

Similar Publications

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.

Am J Emerg Med

December 2024

Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

B Arslan C Nuhoglu M O Satici E Altinbilek

Background: The number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!