Publications by Juan M Banda

Publications by authors named "Juan M Banda"

Page 1 of 2

Towards automated phenotype definition extraction using large language models.

Ramya Tekumalla Juan M Banda

Genomics Inform

October 2024

Article Synopsis

Electronic phenotyping uses various data analysis methods, including machine learning and natural language processing, to define patient characteristics, but the current process is slow and labor-intensive.
Large language models could automate phenotype definition extraction but have reliability issues and potential risks of generating misleading information.
The study aims to create a standard evaluation set for assessing large language models' outputs and to test different prompting methods, showing promising results that still need human validation to ensure accuracy and efficiency in phenotype extraction.

View Article and Find Full Text PDF

Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.

Xu Zuo Yujia Zhou Jon Duke George Hripcsak Nigam Shah Juan M Banda

AMIA Annu Symp Proc

January 2024

Article Synopsis

The diversity of clinical notes in electronic health records (EHRs) highlights the need for standardization to improve data retrieval and integration, which is where the LOINC Document Ontology (DO) comes in, specifically designed for naming clinical documents.
This study evaluated the LOINC DO by mapping clinical note titles from five institutions, categorizing them into three classes based on how similar they are to LOINC DO codes, and developed an automated pipeline for this mapping that doesn't require accessing note content.
The automated mapping system, powered by various language models, demonstrated a high accuracy of 0.90, and the research compared its results with manual mappings to assess LOINC DO's effectiveness and identify opportunities for expanding

View Article and Find Full Text PDF

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Ari Z Klein Juan M Banda Yuting Guo Ana Lucia Schmidt Dongfang Xu

J Am Med Inform Assoc

April 2024

Objective: The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results.

Methods: The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events).

View Article and Find Full Text PDF

Overview of the 8 Social Media Mining for Health Applications (#SMM4H) Shared Tasks at the AMIA 2023 Annual Symposium.

Ari Z Klein Juan M Banda Yuting Guo Ana Lucia Schmidt Dongfang Xu

medRxiv

November 2023

Article Synopsis

* The latest iteration included five tasks across platforms like Twitter and Reddit, covering topics such as COVID-19, therapies, and drug-related events in both English and Spanish, with 29 teams participating from 18 countries.
* The top systems in competitions utilized advanced deep learning techniques, particularly pre-trained transformer models, and a dataset of over 61,000 social media posts will be available for future research.

View Article and Find Full Text PDF

Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases.

Juan M Banda Nigam H Shah Vyjeyanthi S Periyakoil

JAMIA Open

July 2023

Objective: Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults.

Materials And Methods: We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions.

View Article and Find Full Text PDF

Ontologizing health systems data at scale: making translational discovery a reality.

Tiffany J Callahan Adrianne L Stefanski Jordan M Wyrwa Chenjie Zeng Anna Ostropolets Juan M Banda

NPJ Digit Med

May 2023

Article Synopsis

Common data models standardize electronic health record (EHR) data but struggle to fully integrate the necessary resources for deep phenotyping.
The OMOP2OBO algorithm automates the mapping of Observational Medical Outcomes Partnership (OMOP) vocabularies to Open Biological and Biomedical Ontology (OBO) ontologies, significantly reducing the need for manual curation.
With OMOP2OBO, mappings for a large number of conditions, drugs, and measurements were created, facilitating the identification of undiagnosed patients in rare diseases and enhancing opportunities for EHR-based deep phenotyping.

View Article and Find Full Text PDF

Representing and utilizing clinical textual data for real world studies: An OHDSI approach.

Vipina K Keloth Juan M Banda Michael Gurley Paul M Heider Georgina Kennedy

J Biomed Inform

June 2023

Article Synopsis

* The OHDSI consortium's NLP Working Group created methods and tools to improve the use of textual data in observational studies, detailing a framework for integrating this information into the OMOP Common Data Model (CDM).
* The authors also highlight the workflow for extracting and transforming data from clinical notes, share current applications of the NLP solution, and discuss challenges and lessons learned to aid other researchers in implementing NLP in their studies.

View Article and Find Full Text PDF

Reproducible variability: assessing investigator discordance across 9 research teams attempting to reproduce the same observational study.

Anna Ostropolets Yasser Albogami Mitchell Conover Juan M Banda William A Baumgartner

J Am Med Inform Assoc

April 2023

Article Synopsis

This study investigates how different interpretations of an observational study's design can affect the results when independent researchers attempt to reproduce it.
The researchers found that out of ten criteria for including patients, teams only agreed, on average, 4 of 10 times, leading to significant variability in the size and characteristics of the resulting patient cohorts.
The study concludes that providing open analytical code and a standardized data model can improve reproduction accuracy and consistency in observational research.

View Article and Find Full Text PDF

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition.

Davy Weissenbacher Karen O'Connor Siddharth Rawal Yu Zhang Richard Tzong-Han Tsai Juan M Banda

Database (Oxford)

February 2023

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health.

View Article and Find Full Text PDF

Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS.

Kristin Kostka Talita Duarte-Salles Albert Prats-Uribe Anthony G Sena Andrea Pistillo Juan M Banda

Clin Epidemiol

March 2022

Article Synopsis

The study emphasizes the importance of real world data (RWD) for understanding and responding to the COVID-19 pandemic using a standardized approach through the CHARYBDIS framework.
Researchers conducted a retrospective database study across multiple countries, including the US and parts of Europe and Asia, involving over 4.5 million individuals and focusing on their clinical characteristics and outcomes.
Findings reveal higher diagnoses among women but more hospitalizations among men, common comorbidities like diabetes and heart disease, and key symptoms such as cough and fever; this data helps to identify trends in COVID-19 across different populations and time periods.

View Article and Find Full Text PDF

An investigation of spatial-temporal patterns and predictions of the coronavirus 2019 pandemic in Colombia, 2020-2021.

Amna Tariq Tsira Chakhaia Sushma Dahal Alexander Ewing Xinyi Hua Juan M Banda

PLoS Negl Trop Dis

March 2022

Colombia announced the first case of severe acute respiratory syndrome coronavirus 2 on March 6, 2020. Since then, the country has reported a total of 5,002,387 cases and 127,258 deaths as of October 31, 2021. The aggressive transmission dynamics of SARS-CoV-2 motivate an investigation of COVID-19 at the national and regional levels in Colombia.

View Article and Find Full Text PDF

Using weak supervision to generate training datasets from social media data: a proof of concept to identify drug mentions.

Ramya Tekumalla Juan M Banda

Neural Comput Appl

October 2021

Twitter has been a remarkable resource for research in pharmacovigilance in the last decade. Traditionally, rule- or lexicon-based methods have been utilized for automatically extracting drug tweets for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming and not scalable.

View Article and Find Full Text PDF

Negative Perception of the COVID-19 Pandemic Is Dropping: Evidence From Twitter Posts.

Alessandro N Vargas Alexander Maier Marcos B R Vallim Juan M Banda Victor M Preciado

Front Psychol

September 2021

The COVID-19 pandemic hit hard society, strongly affecting the emotions of the people and wellbeing. It is difficult to measure how the pandemic has affected the sentiment of the people, not to mention how people responded to the dramatic events that took place during the pandemic. This study contributes to this discussion by showing that the negative perception of the people of the COVID-19 pandemic is dropping.

View Article and Find Full Text PDF

A biomedically oriented automatically annotated Twitter COVID-19 dataset.

Luis Alberto Robles Hernandez Tiffany J Callahan Juan M Banda

Genomics Inform

September 2021

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the coronavirus disease 2019 (COVID-19) pandemic, researchers have turned to more non-traditional sources of clinical data to characterize the disease in near-real time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present. However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts.

View Article and Find Full Text PDF

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration.

Juan M Banda Ramya Tekumalla Guanyu Wang Jingyuan Yu Tuo Liu

Epidemiologia (Basel)

August 2021

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses.

View Article and Find Full Text PDF

A Biomedically oriented automatically annotated Twitter COVID-19 Dataset.

Luis Alberto Robles Hernandez Tiffany J Callahan Juan M Banda

ArXiv

July 2021

The use of social media data, like Twitter, for biomedical research has been gradually increasing over the years. With the COVID-19 pandemic, researchers have turned to more nontraditional sources of clinical data to characterize the disease in near real-time, study the societal implications of interventions, as well as the sequelae that recovered COVID-19 cases present (Long-COVID). However, manually curated social media datasets are difficult to come by due to the expensive costs of manual annotation and the efforts needed to identify the correct texts.

View Article and Find Full Text PDF

Transmission dynamics and forecasts of the COVID-19 pandemic in Mexico, March-December 2020.

Amna Tariq Juan M Banda Pavel Skums Sushma Dahal Carlos Castillo-Garsow

PLoS One

July 2021

Mexico has experienced one of the highest COVID-19 mortality rates in the world. A delayed implementation of social distancing interventions in late March 2020 and a phased reopening of the country in June 2020 has facilitated sustained disease transmission in the region. In this study we systematically generate and compare 30-day ahead forecasts using previously validated growth models based on mortality trends from the Institute for Health Metrics and Evaluation for Mexico and Mexico City in near real-time.

View Article and Find Full Text PDF

Changes in Public Response Associated With Various COVID-19 Restrictions in Ontario, Canada: Observational Infoveillance Study Using Social Media Time Series Data.

Antony Chum Andrew Nielsen Zachary Bellows Eddie Farrell Pierre-Nicolas Durette Juan M Banda

J Med Internet Res

August 2021

Background: News media coverage of antimask protests, COVID-19 conspiracies, and pandemic politicization has overemphasized extreme views but has done little to represent views of the general public. Investigating the public's response to various pandemic restrictions can provide a more balanced assessment of current views, allowing policy makers to craft better public health messages in anticipation of poor reactions to controversial restrictions.

Objective: Using data from social media, this infoveillance study aims to understand the changes in public opinion associated with the implementation of COVID-19 restrictions (eg, business and school closures, regional lockdown differences, and additional public health restrictions, such as social distancing and masking).

View Article and Find Full Text PDF

Pulse of the pandemic: Iterative topic filtering for clinical information extraction from social media.

Julia Wu Venkatesh Sivaraman Dheekshita Kumar Juan M Banda David Sontag

J Biomed Inform

August 2021

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data.

View Article and Find Full Text PDF

Characterizing all-cause excess mortality patterns during COVID-19 pandemic in Mexico.

Sushma Dahal Juan M Banda Ana I Bento Kenji Mizumoto Gerardo Chowell

BMC Infect Dis

May 2021

Background: Low testing rates and delays in reporting hinder the estimation of the mortality burden associated with the COVID-19 pandemic. During a public health emergency, estimating all cause excess deaths above an expected level of death can provide a more reliable picture of the mortality burden. Here, we aim to estimate the absolute and relative mortality impact of COVID-19 pandemic in Mexico.

View Article and Find Full Text PDF

Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study.

Xu Zuo Jianfu Li Bo Zhao Yujia Zhou Xiao Dong Juan M Banda

AMIA Annu Symp Proc

July 2021

The normalization of clinical documents is essential for health information management with the enormous amount of clinical documentation generated each year. The LOINC Document Ontology (DO) is a universal clinical document standard in a hierarchical structure. The objective of this study is to investigate the feasibility and generalizability of LOINC DO by mapping from clinical note titles across five institutions to five DO axes.

View Article and Find Full Text PDF

A Minimal Information Model for Potential Drug-Drug Interactions.

Harry Hochheiser Xia Jing Elizabeth A Garcia Serkan Ayvaz Ratnesh Sahay Juan M Banda

Front Pharmacol

March 2021

Despite the significant health impacts of adverse events associated with drug-drug interactions, no standard models exist for managing and sharing evidence describing potential interactions between medications. Minimal information models have been used in other communities to establish community consensus around simple models capable of communicating useful information. This paper reports on a new minimal information model for describing potential drug-drug interactions.

View Article and Find Full Text PDF

ACE: the Advanced Cohort Engine for searching longitudinal patient records.

Alison Callahan Vladimir Polony José D Posada Juan M Banda Saurabh Gombar

J Am Med Inform Assoc

July 2021

Objective: To propose a paradigm for a scalable time-aware clinical data search, and to describe the design, implementation and use of a search engine realizing this paradigm.

Materials And Methods: The Advanced Cohort Engine (ACE) uses a temporal query language and in-memory datastore of patient objects to provide a fast, scalable, and expressive time-aware search. ACE accepts data in the Observational Medicine Outcomes Partnership Common Data Model, and is configurable to balance performance with compute cost.

View Article and Find Full Text PDF

Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data.

Katherine J Sullivan Marisha Burden Angela Keniston Juan M Banda Lawrence E Hunter

Pac Symp Biocomput

March 2021

Physicians' beliefs and attitudes about COVID-19 are important to ascertain because of their central role in providing care to patients during the pandemic. Identifying topics and sentiments discussed by physicians and other healthcare workers can lead to identification of gaps relating to theCOVID-19 pandemic response within the healthcare system. To better understand physicians' perspectives on the COVID-19 response, we extracted Twitter data from a specific user group that allows physicians to stay anonymous while expressing their perspectives about the COVID-19 pandemic.

View Article and Find Full Text PDF

Unraveling COVID-19: a large-scale characterization of 4.5 million COVID-19 cases using CHARYBDIS.

Daniel Prieto-Alhambra Kristin Kostka Talita Duarte-Salles Albert Prats-Uribe Anthony Sena Juan M Banda

Res Sq

March 2021

Article Synopsis

Routinely collected real-world data (RWD) is essential for understanding and responding to the COVID-19 pandemic, as demonstrated by the CHARYBDIS framework for standardizing and analyzing this data.
A descriptive cohort study involving over 4.5 million individuals was conducted across the U.S., Europe, and Asia to examine COVID-19-related health risks and outcomes, with detailed information available on an interactive website.
The findings from the CHARYBDIS study serve as benchmarks to enhance our knowledge of COVID-19's progression and management, facilitating timely evaluations of new preventative and therapeutic strategies.

View Article and Find Full Text PDF

Publications by authors named "Juan M Banda"

Towards automated phenotype definition extraction using large language models.

Article Synopsis

Standardizing Multi-site Clinical Note Titles to LOINC Document Ontology: A Transformer-based Approach.

Article Synopsis

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Overview of the 8 Social Media Mining for Health Applications (#SMM4H) Shared Tasks at the AMIA 2023 Annual Symposium.

Article Synopsis

Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases.

Ontologizing health systems data at scale: making translational discovery a reality.

Article Synopsis

Representing and utilizing clinical textual data for real world studies: An OHDSI approach.

Article Synopsis

Reproducible variability: assessing investigator discordance across 9 research teams attempting to reproduce the same observational study.

Article Synopsis

Automatic Extraction of Medication Mentions from Tweets-Overview of the BioCreative VII Shared Task 3 Competition.

Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS.

Article Synopsis

An investigation of spatial-temporal patterns and predictions of the coronavirus 2019 pandemic in Colombia, 2020-2021.

Using weak supervision to generate training datasets from social media data: a proof of concept to identify drug mentions.

Negative Perception of the COVID-19 Pandemic Is Dropping: Evidence From Twitter Posts.

A biomedically oriented automatically annotated Twitter COVID-19 dataset.

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration.

A Biomedically oriented automatically annotated Twitter COVID-19 Dataset.

Transmission dynamics and forecasts of the COVID-19 pandemic in Mexico, March-December 2020.

Changes in Public Response Associated With Various COVID-19 Restrictions in Ontario, Canada: Observational Infoveillance Study Using Social Media Time Series Data.

Pulse of the pandemic: Iterative topic filtering for clinical information extraction from social media.

Characterizing all-cause excess mortality patterns during COVID-19 pandemic in Mexico.

Normalizing Clinical Document Titles to LOINC Document Ontology: an Initial Study.

A Minimal Information Model for Potential Drug-Drug Interactions.

ACE: the Advanced Cohort Engine for searching longitudinal patient records.

Characterization of Anonymous Physician Perspectives on COVID-19 Using Social Media Data.

Unraveling COVID-19: a large-scale characterization of 4.5 million COVID-19 cases using CHARYBDIS.

Article Synopsis

A PHP Error was encountered

A PHP Error was encountered