With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. What makes this frustrating is that private companies hold potentially useful data, but it is not accessible by the people who can use it to track poverty, reduce disease, or build urban infrastructure. This project set out to test whether we can transform an openly available dataset (Twitter) into a resource for urban planning and development. We test our hypothesis by creating road traffic crash location data, which is scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over five and young adults. The research project scraped 874,588 traffic related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. We geolocate 32,991 crash reports in Twitter for 2012-2020 and cluster them into 22,872 unique crashes during this period. For a subset of crashes reported on Twitter, a motorcycle delivery service was dispatched in real-time to verify the crash and its location; the results show 92% accuracy. To our knowledge this is the first geolocated dataset of crashes for the city and allowed us to produce the first crash map for Nairobi. Using a spatial clustering algorithm, we are able to locate portions of the road network (<1%) where 50% of the crashes identified occurred. Even with limitations in the representativeness of the data, the results can provide urban planners with useful information that can be used to target road safety improvements where resources are limited. The work shows how twitter data might be used to create other types of essential data for urban planning in resource poor environments.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7857609PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244317PLOS

Publication Analysis

Top Keywords

machine learning
8
resource urban
8
urban planning
8
crash location
8
crash
5
applying machine
4
learning geolocation
4
geolocation techniques
4
techniques social
4
social media
4

Similar Publications

Purpose: To quantify outer retina structural changes and define novel biomarkers of inherited retinal degeneration associated with biallelic mutations in RPE65 (RPE65-IRD) in patients before and after subretinal gene augmentation therapy with voretigene neparvovec (Luxturna).

Methods: Application of advanced deep learning for automated retinal layer segmentation, specifically tailored for RPE65-IRD. Quantification of five novel biomarkers for the ellipsoid zone (EZ): thickness, granularity, reflectivity, and intensity.

View Article and Find Full Text PDF

Women are disproportionately affected by chronic autoimmune diseases (AD) like systemic lupus erythematosus (SLE), scleroderma, rheumatoid arthritis (RA), and Sjögren's syndrome. Traditional evaluations often underestimate the associated cardiovascular disease (CVD) and stroke risk in women having AD. Vitamin D deficiency increases susceptibility to these conditions.

View Article and Find Full Text PDF

The combination of physiology and machine learning for prediction of CPAP pressure and residual AHI in OSA.

J Clin Sleep Med

January 2025

Division of Pulmonary, Critical Care, and Sleep Medicine, UC San Diego, San Diego, CA.

Continuous positive airway pressure (CPAP) is the treatment of choice for obstructive sleep apnea (OSA); however some people have residual respiratory events or require significantly higher CPAP pressure while on therapy. Our objective was to develop predictive models for CPAP outcomes and assess whether the inclusion of physiological traits enhances prediction. We constructed predictive models from baseline information for subsequent residual apnea-hypopnea index (AHI) and optimal CPAP pressure.

View Article and Find Full Text PDF

Active learning of molecular data for task-specific objectives.

J Chem Phys

January 2025

Department of Applied Physics, Aalto University, P.O. Box 11000, FI-00076 Aalto, Finland.

Active learning (AL) has shown promise to be a particularly data-efficient machine learning approach. Yet, its performance depends on the application, and it is not clear when AL practitioners can expect computational savings. Here, we carry out a systematic AL performance assessment for three diverse molecular datasets and two common scientific tasks: compiling compact, informative datasets and targeted molecular searches.

View Article and Find Full Text PDF

Unlabelled: Thousands of complete genome sequences for strains of a species that are now available enable the advancement of pangenome analytics to a new level of sophistication. We collected 2,377 publicly available complete genomes of for detailed pangenome analysis. The core genome and accessory genomes consisted of 2,398 and 5,182 genes, respectively.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!