Large-scale Vietnamese point-of-interest classification using weak labeling.

Front Artif Intell

Center of Multidisciplinary Integrated Technologies for Field Monitoring, Vietnam National University of Engineering and Technology, Hanoi, Vietnam.

Published: December 2022

Point-of-Interests (POIs) represent geographic location by different categories (e.g., touristic places, amenities, or shops) and play a prominent role in several location-based applications. However, the majority of POIs category labels are crowd-sourced by the community, thus often of low quality. In this paper, we introduce the first annotated dataset for the POIs categorical classification task in Vietnamese. A total of 750,000 POIs are collected from WeMap, a Vietnamese digital map. Large-scale hand-labeling is inherently time-consuming and labor-intensive, thus we have proposed a new approach using weak labeling. As a result, our dataset covers 15 categories with 275,000 weak-labeled POIs for training, and 30,000 gold-standard POIs for testing, making it the largest compared to the existing Vietnamese POIs dataset. We empirically conduct POI categorical classification experiments using a strong baseline (BERT-based fine-tuning) on our dataset and find that our approach shows high efficiency and is applicable on a large scale. The proposed baseline gives an F1 score of 90% on the test dataset, and significantly improves the accuracy of WeMap POI data by a margin of 37% (from 56 to 93%).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780588PMC
http://dx.doi.org/10.3389/frai.2022.1020532DOI Listing

Publication Analysis

Top Keywords

weak labeling
8
categorical classification
8
pois
7
dataset
5
large-scale vietnamese
4
vietnamese point-of-interest
4
point-of-interest classification
4
classification weak
4
labeling point-of-interests
4
point-of-interests pois
4

Similar Publications

Objectives: The development of valuable artificial intelligence (AI) tools to assist with ultrasound diagnosis depends on algorithms developed using high-quality data. This study aimed to test the intra- and interobserver agreement of a proposed image-quality scoring system to quantify the quality of gynecological transvaginal ultrasound (TVS) images, which could be used in clinical practice and AI tool development.

Methods: A proposed scoring system to quantify TVS image quality was created following a review of the literature.

View Article and Find Full Text PDF

Data regarding Penicillin allergy labels (PALs) from India and Sri Lanka are sparse. Emerging data suggests that the proportion of patients declaring an unverified PAL in secondary care in India and Sri Lanka (1%-4%) is lesser than that reported in High Income Countries (15%-20%). However, even this relatively small percentage translates into a large absolute number, as this part of the world accounts for approximately 25% of the global population.

View Article and Find Full Text PDF

Deep neural networks (DNNs) have demonstrated exceptional performance across various image segmentation tasks. However, the process of preparing datasets for training segmentation DNNs is both labor-intensive and costly, as it typically requires pixel-level annotations for each object of interest. To mitigate this challenge, alternative approaches such as using weak labels (e.

View Article and Find Full Text PDF

Intermolecular hydrogen bonds between carboxyl (COO) and amino groups are a common weak interaction in proteins. Infrared (IR) spectral assignment of such an intermolecular hydrogen bond provides a fingerprint for studying protein-protein interactions as its absorption frequency is affected by the molecular electrostatic environment. Temperature-dependent FTIR and temperature-jump time-resolved IR absorbance difference spectra of several typical amino acids and those of wild type and single-site mutated αB-crystallin were performed.

View Article and Find Full Text PDF

Multifunctional applications enabled by tunable multi-emission and ultra-broadband VIS-NIR luminescence energy transfer in Sn/Mn-doped lead-free Zn-based metal halides.

Mater Horiz

January 2025

School of Physical Science and Technology, School of Chemistry and Chemical Engineering, State Key Laboratory of Featured Metal Materials and Life-cycle Safety for Composite Structures, and School of Resources, Environment and Materials, Guangxi University, Nanning 530004, China.

Metal halides are widely applied in solid-state lighting (SSL), optoelectronic devices, information encryption, and near-infrared (NIR) detection due to their superior photoelectric properties and tunable emission. However, single-component phosphors that can be efficiently excited by light-emitting diode (LED) chips and cover both the visible (VIS) and NIR emission regions are still very rare. To address this issue, (TPA)ZnBr:Sn/Mn (TPA = [(CHCHCH)N]) phosphors were synthesized by using the solvent evaporation method.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!