AI Article Synopsis

  • Qualitative methods play a crucial role in disseminating digital health interventions, but they can be time-consuming; emerging generative AI technologies like ChatGPT and Bard may offer quicker analysis options, although their effectiveness is still under investigation.
  • The study aimed to compare the thematic consistency and reliability of analyses between human coders and AI tools on SMS reminders for enhancing medication adherence in individuals with HIV.
  • Results indicated that while GenAI produced similar themes to human analysts, particularly in inductive analysis, the agreement and consistency showed varying levels of reliability, suggesting potential but also limitations for AI in qualitative research.

Article Abstract

Background: Qualitative methods are incredibly beneficial to the dissemination and implementation of new digital health interventions; however, these methods can be time intensive and slow down dissemination when timely knowledge from the data sources is needed in ever-changing health systems. Recent advancements in generative artificial intelligence (GenAI) and their underlying large language models (LLMs) may provide a promising opportunity to expedite the qualitative analysis of textual data, but their efficacy and reliability remain unknown.

Objective: The primary objectives of our study were to evaluate the consistency in themes, reliability of coding, and time needed for inductive and deductive thematic analyses between GenAI (ie, ChatGPT and Bard) and human coders.

Methods: The qualitative data for this study consisted of 40 brief SMS text message reminder prompts used in a digital health intervention for promoting antiretroviral medication adherence among people with HIV who use methamphetamine. Inductive and deductive thematic analyses of these SMS text messages were conducted by 2 independent teams of human coders. An independent human analyst conducted analyses following both approaches using ChatGPT and Bard. The consistency in themes (or the extent to which the themes were the same) and reliability (or agreement in coding of themes) between methods were compared.

Results: The themes generated by GenAI (both ChatGPT and Bard) were consistent with 71% (5/7) of the themes identified by human analysts following inductive thematic analysis. The consistency in themes was lower between humans and GenAI following a deductive thematic analysis procedure (ChatGPT: 6/12, 50%; Bard: 7/12, 58%). The percentage agreement (or intercoder reliability) for these congruent themes between human coders and GenAI ranged from fair to moderate (ChatGPT, inductive: 31/66, 47%; ChatGPT, deductive: 22/59, 37%; Bard, inductive: 20/54, 37%; Bard, deductive: 21/58, 36%). In general, ChatGPT and Bard performed similarly to each other across both types of qualitative analyses in terms of consistency of themes (inductive: 6/6, 100%; deductive: 5/6, 83%) and reliability of coding (inductive: 23/62, 37%; deductive: 22/47, 47%). On average, GenAI required significantly less overall time than human coders when conducting qualitative analysis (20, SD 3.5 min vs 567, SD 106.5 min).

Conclusions: The promising consistency in the themes generated by human coders and GenAI suggests that these technologies hold promise in reducing the resource intensiveness of qualitative thematic analysis; however, the relatively lower reliability in coding between them suggests that hybrid approaches are necessary. Human coders appeared to be better than GenAI at identifying nuanced and interpretative themes. Future studies should consider how these powerful technologies can be best used in collaboration with human coders to improve the efficiency of qualitative research in hybrid approaches while also mitigating potential ethical risks that they may pose.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11329846PMC
http://dx.doi.org/10.2196/54482DOI Listing

Publication Analysis

Top Keywords

human coders
24
consistency themes
20
chatgpt bard
16
thematic analyses
12
reliability coding
12
deductive thematic
12
thematic analysis
12
themes
11
human
10
qualitative
8

Similar Publications

This study aims to provide an LLM (Large Language Model)-based method for the discourse analysis of media attitudes, and thereby investigate media attitudes towards China in a Hong Kong-based newspaper. Analysis of attitudes in large amounts of media data is crucial for understanding public opinions, market trends, social dynamics, etc. However, corpus-based approaches have traditionally focused on explicit linguistic expressions of attitudes, leaving implicit expressions unconsidered.

View Article and Find Full Text PDF

A strong body of evidence has underscored the cross-cultural importance of nurturing parent-child relationships for promoting early child development outcomes. However, most research on parenting has predominantly relied on self-reported measures collected from mothers. Observational tools for assessing parent-child interactions from not only mothers but also fathers remains limited, especially in Majority World contexts.

View Article and Find Full Text PDF

Introduction: The ability of healthcare, community and public health systems to effectively implement and disseminate research innovations depends on contextual factors at multiple interconnected levels of influence (eg, the innovation, individual, provider/implementor, organisation and health system). Recently, there has been an increase in the development of complex interventions designed to target multiple levels, designed for or adapted to the context in which they are delivered. Two concepts from complex systems thinking have been increasingly used to operationalise such interventions-core functions (theory and evidence-driven purposes of interventions) and forms (adaptable activities that perform each core function).

View Article and Find Full Text PDF

Public Health Discussions on Social Media: Evaluating Automated Sentiment Analysis Methods.

JMIR Form Res

January 2025

Department of Health Administration, The College of Health Professions, Central Michigan University, Mt Pleasant, MI, United States.

Article Synopsis
  • Sentiment analysis is a key method for analyzing text, especially in social media research, where the choice between manual and automated methods is crucial.
  • The study compared several sentiment analysis tools, including VADER, TEXT2DATA, LIWC-22, and ChatGPT 4.0, against manually coded sentiment scores from YouTube comments on the opioid crisis, assessing factors like ease of use and cost.
  • Findings revealed that LIWC-22 excelled in identifying sentiment patterns, while VADER was best at classifying negative comments, but overall, automated tools showed only fair agreement with manual coding, with ChatGPT performing poorly.
View Article and Find Full Text PDF

Many children on the autism spectrum engage in challenging behaviors, like aggression, due to difficulties communicating and regulating their stress. Identifying effective intervention strategies is often subjective and time-consuming. Utilizing unobservable internal physiological data to predict strategy effectiveness may help simplify this process for teachers and parents.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!