Background: This study evaluates the use of large language models (LLMs) to analyze free-text responses from large-scale global health surveys, using data from the Enquête de Couverture Vaccinale (ECV) household coverage surveys from 2020, 2021, 2022 and 2023 as a case study.

Methods: We tested several LLM approaches consisting of zero-shot and few-shot prompting, fine-tuning, and a natural language processing approach using semantic embeddings, to analyze responses on the reasons caregivers did not vaccinate their children.

Results: Performance ranged from 61.5% to 96% based on testing against a curated benchmarking dataset drawn from the ECV surveys, with accuracy improving when LLMs were fine-tuned or provided examples for few-shot learning. We show that even with as few as 20-100 examples, LLMs can achieve high accuracy in categorizing free-text responses.

Conclusions: This approach offers significant opportunities for reanalyzing existing datasets and designing surveys with more open-ended questions, providing a scalable, cost-effective solution for global health organizations. Despite challenges with closed-source models and computational costs, the study underscores LLMs' potential to enhance data analysis and inform global health policy.

Download full-text PDF

Source
http://dx.doi.org/10.1093/inthealth/ihaf015DOI Listing

Publication Analysis

Top Keywords

global health
16
large language
8
language models
8
health surveys
8
surveys
5
models analyzing
4
analyzing open
4
open text
4
global
4
text global
4

Similar Publications

Introduction: A better understanding of who will develop dementia can inform patient care. Although MRI offers prognostic insights, access is limited globally, whereas CT-imaging is readily available in acute stroke. We explored the prognostic utility of acute CT-imaging for predicting dementia.

View Article and Find Full Text PDF

Aims: Osteoarthritis (OA) is a widespread chronic degenerative joint disease with an increasing global impact. The pathogenesis of OA involves complex interactions between genetic and environmental factors. Despite this, the specific genetic mechanisms underlying OA remain only partially understood, hindering the development of targeted therapeutic strategies.

View Article and Find Full Text PDF

Background: Congenital heart defects (CHDs) are the most prevalent birth defects globally and the second leading cause of death in Mexican children under five. This study examines how industrial activity and social vulnerabilities independently and jointly influence CHD incidence across 2446 Mexican municipalities from 2008 to 2019.

Methods: Using negative binomial regression models, we evaluated associations between polluting industries, healthcare access, and CHD incidence.

View Article and Find Full Text PDF

Introduction: Colorectal cancer (CRC) incidence is increasing in Uganda. Despite this, and the disproportionately high burden of early onset and late-stage CRC cases, no CRC screening program exists in Uganda. To guide and inform future CRC prevention efforts, interviews with key stakeholders were undertaken to better understand the perceived barriers and opportunities relevant to the development and implementation of a CRC screening program in Uganda.

View Article and Find Full Text PDF

Background And Aims: Better understanding the challenges faced by patients on medications for opioid use disorder (MOUD), including methadone and buprenorphine, is critical to increasing their use/retention. Social media platforms such as Reddit offer a space for patients to share their experiences with medications. We aimed to identify and characterize challenges faced by patients taking MOUD through analysis of discussions from the r/Methadone and r/suboxone subreddits.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!