This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3758/s13428-024-02561-7 | DOI Listing |
Behav Res Methods
December 2024
ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.
This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.
View Article and Find Full Text PDFBehav Res Methods
December 2024
Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.
Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part.
View Article and Find Full Text PDFEur Neuropsychopharmacol
December 2024
Bipolar and Depressive Disorders Unit, Hospital Clinic de Barcelona, Barcelona, Spain; Fundació Clínic per la Recerca Biomèdica-Institut d'Investigacions Biomèdiques August Pi i Sunyer (FCRB-IDIBAPS), Barcelona, Spain; Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain.
Older Adults with Bipolar Disorder (OABD) represent a heterogeneous group, including those with early and late onset of the disorder. Recent evidence shows both groups have distinct clinical, cognitive, and medical features, tied to different neurobiological profiles. This study explored the link between polygenic risk scores (PRS) for bipolar disorder (PRS-BD), schizophrenia (PRS-SCZ), and major depressive disorder (PRS-MDD) with age of onset in OABD.
View Article and Find Full Text PDFJ Neurosci Methods
December 2024
Laboratoire des Systèmes Perceptifs, Département d'Étude Cognitive, École Normale Supérieure, PSL, Paris, France; Institute for Systems Research, Electrical and Computer Engineering, University of Maryland, College Park, USA.
Background: IDyOM (Information Dynamics of Music) is the statistical model of music the most used in the community of neuroscience of music. It has been shown to allow for significant correlations with EEG (Marion, 2021), ECoG (Di Liberto, 2020) and fMRI (Cheung, 2019) recordings of human music listening. The language used for IDyOM -Lisp- is not very familiar to the neuroscience community and makes this model hard to use and more importantly to modify.
View Article and Find Full Text PDFCirc Cardiovasc Qual Outcomes
December 2024
Surgical Sabermetrics Laboratory, Centre for Medical Informatics, Usher Institute, The University of Edinburgh, Scotland (S.Y.).
Background: Safety in cardiac surgical procedures is predicated on effective team dynamics. This study associated operative team familiarity (ie, the extent of clinical collaboration among surgical team members) with procedural efficiency and Society of Thoracic Surgeons (STS) adjudicated patient outcomes.
Methods: Institutional STS adult cardiac surgery registry and electronic health record data from 2014 to 2021 were evaluated across 3 quaternary hospitals.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!