A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Comput Intell Neurosci

Machine Learning and Computer Vision Laboratory, Faculty of Computer Science, Universitas Indonesia, Depok, West Java 16424, Indonesia.

Published: February 2017

One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text mining poses more challenges, for example, more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug, the lack of labeled dataset sources and external knowledge, and the multiple token representations for a single drug name. Although many approaches have been proposed to overwhelm the task, some problems remained with poor -score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, that is, MLP. The second technique involves two deep network classifiers, that is, DBN and SAE. The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, that is, LSTM. In extracting the drug name entities, the third technique gives the best -score performance compared to the state of the art, with its average -score being 0.8645.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5098107PMC
http://dx.doi.org/10.1155/2016/3483528DOI Listing

Publication Analysis

Top Keywords

data representation
12
medical text
8
-score performance
8
representation techniques
8
third technique
8
drug
5
data
4
representation based
4
based training
4
training data
4

Similar Publications

In shallow water, reverberation complicates the detection of low-intensity, variable-echo moving targets, such as divers. Traditional methods often fail to distinguish these targets from reverberation, and data-driven methods are constrained by the limited data on intruding targets. This paper introduces the online robust principal component analysis and multimodal anomaly detection (ORMAD) method to address these challenges.

View Article and Find Full Text PDF

Using a Mobile Health App (ColonClean) to Enhance the Effectiveness of Bowel Preparation: Development and Usability Study.

JMIR Hum Factors

January 2025

School of Nursing, National Taipei University of Nursing and Health Sciences, Room B631, No. 365, Ming-te Road, Peitou District, Taipei City, 11219, Taiwan, 886 2 28227101 ext 3186.

Background: Colonoscopy is the standard diagnostic method for colorectal cancer. Patients usually receive written and verbal instructions for bowel preparation (BP) before the procedure. Failure to understand the importance of BP can lead to inadequate BP in 25%-30% of patients.

View Article and Find Full Text PDF

Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset.

Healthcare (Basel)

December 2024

Department of Computer Science, School of Arts, Humanities and Social Sciences, University of Roehampton, London SW15 5PH, UK.

: Diabetes is a metabolic disorder characterized by increased blood sugar levels. Early detection of diabetes could help individuals to manage and delay the progression of this disorder effectively. Machine learning (ML) methods are important in forecasting the progression and diagnosis of different medical problems with better accuracy.

View Article and Find Full Text PDF

CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning.

Med Image Comput Comput Assist Interv

October 2024

Department of Biomedical Engineering, Yale University, New Haven, CT, USA.

Recent advancements in Contrastive Language-Image Pre-training (CLIP) [21] have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets are not always common. Meanwhile, the language model prompts are mainly manually derived from labels tied to images, potentially overlooking the richness of information within training samples.

View Article and Find Full Text PDF

Objectives: Despite progress in promoting gender equality, gender bias remains a significant obstacle for women and hinders their academic advancement. We aim to survey and critically analyze women's representation in conferences and changes over time in various regions of Asian countries.

Methods: An international survey was conducted with representatives from East Asia (Hong Kong, China, and Japan), South Asia (India and Pakistan), and Southeast Asia (Vietnam and Thailand).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!