Using large language models to detect outcomes in qualitative studies of adolescent depression.

Alison W Xin Dylan M Nielson Karolin Rose Krause Guilherme Fiorini Nick Midgley Francisco Pereira Juan Antonio Lossio-Ventura

J Am Med Inform Assoc

Machine Learning Core, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, United States.

Published: December 2024

Objective: We aim to use large language models (LLMs) to detect mentions of nuanced psychotherapeutic outcomes and impacts than previously considered in transcripts of interviews with adolescent depression. Our clinical authors previously created a novel coding framework containing fine-grained therapy outcomes beyond the binary classification (eg, depression vs control) based on qualitative analysis embedded within a clinical study of depression. Moreover, we seek to demonstrate that embeddings from LLMs are informative enough to accurately label these experiences.

Materials And Methods: Data were drawn from interviews, where text segments were annotated with different outcome labels. Five different open-source LLMs were evaluated to classify outcomes from the coding framework. Classification experiments were carried out in the original interview transcripts. Furthermore, we repeated those experiments for versions of the data produced by breaking those segments into conversation turns, or keeping non-interviewer utterances (monologues).

Results: We used classification models to predict 31 outcomes and 8 derived labels, for 3 different text segmentations. Area under the ROC curve scores ranged between 0.6 and 0.9 for the original segmentation and 0.7 and 1.0 for the monologues and turns.

Discussion: LLM-based classification models could identify outcomes important to adolescents, such as friendships or academic and vocational functioning, in text transcripts of patient interviews. By using clinical data, we also aim to better generalize to clinical settings compared to studies based on public social media data.

Conclusion: Our results demonstrate that fine-grained therapy outcome coding in psychotherapeutic text is feasible, and can be used to support the quantification of important outcomes for downstream uses.

Download full-text PDF	Source
http://dx.doi.org/10.1093/jamia/ocae298	DOI Listing

Publication Analysis

Top Keywords

large language

language models

adolescent depression

coding framework

fine-grained therapy

classification models

outcomes

models

models detect

detect outcomes

Similar Publications

Swallowing, speech and voice impairments in head and neck cancer patients treated at a multidisciplinary integrated patient unit.

Int J Lang Commun Disord

December 2024

Hearing, Speech & Language Center, Sheba Medical Center, Tel Hashomer, Israel.

Osnat Kandelshine-Waldman Omer Levy-Kardash Anat Hamburger Eran Alon Yael Henkin

Background: Head and neck cancer (HNC) is amongst the 10 most common cancers worldwide and has a major effect on patients' quality of life. Given the complexity of this unique group of patients, a multidisciplinary team approach is preferable. Amongst the debilitating sequels of HNC and/or its treatment, swallowing, speech and voice impairments are prevalent and require the involvement of speech-language pathologists (SLPs).

View Article and Find Full Text PDF

Similar Publications

Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge.

Behav Res Methods

December 2024

ETSI de Telecomunicación, Universidad Politécnica de Madrid, Avenida Complutense, 30, 28040, Madrid, Spain.

Marc Brysbaert Gonzalo Martínez Pedro Reviriego

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures.

View Article and Find Full Text PDF

Similar Publications

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words.

Behav Res Methods

December 2024

Department of Psychology, University of Milano-Bicocca, P.zza dell'Ateneo Nuovo, 1, 20126, Milano, Italy.

Simona Amenta Andrea Gregor de Varda Pawel Mandera Emmanuel Keuleers Marc Brysbaert

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part.

View Article and Find Full Text PDF

Similar Publications

Evaluating ChatGPT's Multilingual Performance in Clinical Nutrition Advice Using Synthetic Medical Text: Insights from Central Asia.

J Nutr

December 2024

Department of Biomedical Sciences, School of Medicine, Nazarbayev University, Astana, 010000, Kazakhstan. Electronic address:

Gulnoza Adilmetova Ruslan Nassyrov Aizhan Meyerbekova Aknur Karabay Huseyin Atakan Varol

Background: While large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific socio-cultural nuances and regional cuisines, such as those in Central Asia (e.g., Kazakhstan), still requires further investigation.

View Article and Find Full Text PDF

Similar Publications

Evaluating LLM-based generative AI tools in emergency triage: A comparative study of ChatGPT Plus, Copilot Pro, and triage nurses.

Am J Emerg Med

December 2024

Department of Emergency Medicine, Sisli Hamidiye Etfal Training and Research Hospital, Istanbul, Turkey.

B Arslan C Nuhoglu M O Satici E Altinbilek

Background: The number of emergency department (ED) visits has been on steady increase globally. Artificial Intelligence (AI) technologies, including Large Language Model (LLMs)-based generative AI models, have shown promise in improving triage accuracy. This study evaluates the performance of ChatGPT and Copilot in triage at a high-volume urban hospital, hypothesizing that these tools can match trained physicians' accuracy and reduce human bias amidst ED crowding challenges.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!