Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance.

PLoS Comput Biol

Boston Children's Hospital Informatics Program, Boston, Massachusetts, United States of America; Harvard Medical School, Boston, Massachusetts, United States of America.

Published: October 2015

We present a machine learning-based methodology capable of providing real-time ("nowcast") and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013-2014 (retrospective) and 2014-2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizons.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4626021PMC
http://dx.doi.org/10.1371/journal.pcbi.1004513DOI Listing

Publication Analysis

Top Keywords

data source
12
data
9
social media
8
data sources
8
forecast estimates
8
ensemble approach
8
time horizons
8
estimates comparable
8
comparable accuracy
8
predictions
5

Similar Publications

Integrated analysis of the prevalence and influencing factors of poststroke dysphagia.

Eur J Med Res

January 2025

Clinical Research and Big Data Center, South China Research Center for Acupuncture and Moxibustion, Medical College of Acu-Moxi and Rehabilitation, Guangzhou University of Chinese Medicine, Guangzhou, China.

Objectives: Poststroke dysphagia (PSD) is a common complication after stroke but there is limited information on its global prevalence and influencing factors, such as spatial, temporal, demographic characteristics, and stroke-related factors. Our study seeks to fill this knowledge gap by exploring the overall prevalence of PSD and its influencing factors.

Methods: A search of English-language literature from database inception from 2005 until May 2022 was performed using PubMed, Embase, Web of Science, Cochrane Library, and Scopus.

View Article and Find Full Text PDF

Background: Environmental exposures such as airborne pollutant exposures and socio-economic indicators are increasingly recognized as important to consider when conducting clinical research using electronic health record (EHR) data or other sources of clinical data such as survey data. While numerous public sources of geospatial and spatiotemporal data are available to support such research, the data are challenging to work with due to inconsistencies in file formats and spatiotemporal resolutions, computational challenges with large file sizes, and a lack of tools for patient- or subject-level data integration.

Results: We developed FHIR PIT (HL7® Fast Healthcare Interoperability Resources Patient data Integration Tool) as an open-source, modular, data-integration software pipeline that consumes EHR data in FHIR® format and integrates the data at the level of the patient or subject with environmental exposures data of varying spatiotemporal resolutions and file formats.

View Article and Find Full Text PDF

Background: Innovative health technologies have increasingly emerged as a promising solution for patients with untreatable or challenging conditions. However, these technologies often come with expensive costs and limited evidence at the time of launch. This study assessed how these high-priced drugs with limited evidence were appraised and introduced in South Korea, England, Australia, and Canada, where cost-effectiveness analysis (CEA) generally plays a central role in pricing and reimbursement decisions.

View Article and Find Full Text PDF

Background: Virtual care (VC) for dementia in primary care settings is an important aspect of healthcare delivery in Canada. However, the evidence informing optimal and sustainable provision of VC for persons living with dementia (PLWD) and their care partners is scarce. The objectives of this study were to (1) describe the frequency of VC use, (2) identify characteristics of PLWD, care partners, and family physicians (FPs) that are associated with the use of VC, and (3) explore FPs' perceptions of barriers and facilitators to provide VC for PLWD and their care partners.

View Article and Find Full Text PDF

This study presents an integrated framework that combines spatial clustering techniques and multi-source geospatial data to comprehensively assess and understand geological hazards in Hunan Province, China. The research integrates self-organizing map (SOM) and geo-self-organizing map (Geo-SOM) to explore the relationships between environmental factors and the occurrence of various geological hazards, including landslides, slope failures, collapses, ground subsidence, and debris flows. The key findings reveal that annual average precipitation (Pre), profile curvature (Pro_cur), and slope (Slo) are the primary factors influencing the composite geological hazard index (GI) across the province.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!