Detecting Novel and Emerging Drug Terms Using Natural Language Processing: A Social Media Corpus Study.

JMIR Public Health Surveill

Center for Advanced Study of Language, University of Maryland, College Park, MD, United States.

Published: January 2018

Background: With the rapid development of new psychoactive substances (NPS) and changes in the use of more traditional drugs, it is increasingly difficult for researchers and public health practitioners to keep up with emerging drugs and drug terms. Substance use surveys and diagnostic tools need to be able to ask about substances using the terms that drug users themselves are likely to be using. Analyses of social media may offer new ways for researchers to uncover and track changes in drug terms in near real time. This study describes the initial results from an innovative collaboration between substance use epidemiologists and linguistic scientists employing techniques from the field of natural language processing to examine drug-related terms in a sample of tweets from the United States.

Objective: The objective of this study was to assess the feasibility of using distributed word-vector embeddings trained on social media data to uncover previously unknown (to researchers) drug terms.

Methods: In this pilot study, we trained a continuous bag of words (CBOW) model of distributed word-vector embeddings on a Twitter dataset collected during July 2016 (roughly 884.2 million tokens). We queried the trained word embeddings for terms with high cosine similarity (a proxy for semantic relatedness) to well-known slang terms for marijuana to produce a list of candidate terms likely to function as slang terms for this substance. This candidate list was then compared with an expert-generated list of marijuana terms to assess the accuracy and efficacy of using word-vector embeddings to search for novel drug terminology.

Results: The method described here produced a list of 200 candidate terms for the target substance (marijuana). Of these 200 candidates, 115 were determined to in fact relate to marijuana (65 terms for the substance itself, 50 terms related to paraphernalia). This included 30 terms which were used to refer to the target substance in the corpus yet did not appear on the expert-generated list and were therefore considered to be successful cases of uncovering novel drug terminology. Several of these novel terms appear to have been introduced as recently as 1 or 2 months before the corpus time slice used to train the word embeddings.

Conclusions: Though the precision of the method described here is low enough as to still necessitate human review of any candidate term lists generated in such a manner, the fact that this process was able to detect 30 novel terms for the target substance based only on one month's worth of Twitter data is highly promising. We see this pilot study as an important proof of concept and a first step toward producing a fully automated drug term discovery system capable of tracking emerging NPS terms in real time.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5838358PMC
http://dx.doi.org/10.2196/publichealth.7726DOI Listing

Publication Analysis

Top Keywords

terms
17
drug terms
12
social media
12
terms substance
12
word-vector embeddings
12
target substance
12
drug
8
natural language
8
language processing
8
terms real
8

Similar Publications

Objective: To evaluate the accuracy of the defining characteristics of the nursing diagnosis Impaired skin integrity (00046) in patients admitted to intensive care units (ICUs).

Methods: A cross-sectional diagnostic accuracy study was conducted with 105 adult patients admitted to an ICU. A latent class model with random effects was used to test the sensitivity and specificity of the defining characteristics investigated.

View Article and Find Full Text PDF

Research suggests that the quality of care provided by family members may be influenced by the quality of relationship they have with the person living with dementia. The study investigated this in the context of assisting with daily activities. The quality of the relationship was assessed using the conceptual framework of relationship continuity/discontinuity which focuses on whether the carer experiences their relationship as continuous or discontinuous with the pre-dementia relationship.

View Article and Find Full Text PDF

Background/purpose: Temporomandibular joint (TMJ) arthritis causes inflammation and degradation of the mandibular condylar cartilage and subchondral bone. Complete Freund's adjuvant (CFA) and collagen-induced arthritis (CIA) are models for studying TMJ arthritis. While micro-computed tomography (micro-CT) is crucial for three-dimensional (3D) bone analysis, it has limitations in imaging nonmineralized tissues.

View Article and Find Full Text PDF

In recent years, regulatory authorities have signaled a willingness to consider real-world evidence (RWE) data to support applications for new claims and indications for pharmaceuticals. Historically, RWE studies have been the domain of prescription drugs, driven by the fact that clinical data on patients are routinely captured in medical records, claims databases, registries, etc. However, RWE reports of nonprescription drugs and supplements are relatively sparse due to methodological gaps in this area.

View Article and Find Full Text PDF

Background: Injuries to the common peroneal nerve often result in significant sensory and motor function loss, severely affecting patients' quality of life. Although existing treatments, including medication and surgery, provide some degree of efficacy, their effectiveness is limited by factors such as tolerance and adverse side effects.

Methods: This study aims to evaluate the effects of a 4-week regimen of mirror therapy combined with neuromuscular electrical stimulation on lower limb function, muscle strength, and sensation in patients with common peroneal nerve injuries.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!