AI Article Synopsis

  • - The study investigates seven large language models (LLMs) to evaluate how stable their responses are over time and how consistent those responses are across different evaluators, highlighting varying levels of agreement among the models.
  • - Results showed that some models, like Llama3 and GPT-4o, had higher consistency in their personality responses, while others had less stable responses, indicating differences in their ability to simulate reliable personality traits.
  • - The LLMs demonstrated socially desirable personality profiles, showing traits like agreeableness and conscientiousness, but the variability in their responses raises important questions about the implications for AI safety and societal impact.

Article Abstract

As large language models (LLMs) continue to gain popularity due to their human-like traits and the intimacy they offer to users, their societal impact inevitably expands. This leads to the rising necessity for comprehensive studies to fully understand LLMs and reveal their potential opportunities, drawbacks and overall societal impact. With that in mind, this research conducted an extensive investigation into seven LLMs, aiming to assess the temporal stability and inter-rater agreement on their responses on personality instruments in two time points. In addition, LLMs' personality profile was analysed and compared with human normative data. The findings revealed varying levels of inter-rater agreement in the LLMs' responses over a short time, with some LLMs showing higher agreement (e.g. Llama3 and GPT-4o) compared with others (e.g. GPT-4 and Gemini). Furthermore, agreement depended on used instruments as well as on domain or trait. This implies the variable robustness in LLMs' ability to reliably simulate stable personality characteristics. In the case of scales which showed at least fair agreement, LLMs displayed mostly a socially desirable profile in both agentic and communal domains, as well as a prosocial personality profile reflected in higher agreeableness and conscientiousness and lower Machiavellianism. Exhibiting temporal stability and coherent responses on personality traits is crucial for AI systems due to their societal impact and AI safety concerns.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461045PMC
http://dx.doi.org/10.1098/rsos.240180DOI Listing

Publication Analysis

Top Keywords

temporal stability
12
societal impact
12
large language
8
language models
8
inter-rater agreement
8
responses personality
8
personality profile
8
personality
6
llms
5
agreement
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!