As one of the statistical-based models, an -gram syllabification commonly gives a high syllable error rate (SER) for Bahasa Indonesia, one of the low-resource languages, since it fails for a high out-of-vocabulary (OOV) rate. Two previous models: bigram-syllabification with flipping onsets (BFO) and a combination of bigram with backoff smoothing based on phonological similarity (CBSPS), which use augmentation methods, can reduce the OOV rate. However, there are two problems in both BFO and CBSPS. First, they use an -gram that is applied syllable-level, instead of grapheme-level, so that they suffer on the sparsity of -grams. Second, they rely on a procedure to detect the positions of both vowels and diphthongs. Both problems make them not capable of distinguishing diphthongs from derivative words as well as syllabifying named-entities, which have many ambiguities related to vowels and semi-vowels. In this paper, a syllabification based on an -gram tagger, which is applied on grapheme-level and does not rely on both vowel and diphthong detections, is developed to solve both problems. Besides, three data augmentation methods are exploited to enrich the dataset. The 5-fold cross-validations (5-FCV) using both datasets of 50 k words and 15 k named-entities show that the proposed augmented-syllabification of -gram tagger (ASnGT) model is significantly better than both BFO and CBSPS. It is also significantly better than the fuzzy -nearest neighbor in every class (FkNNC)-based model for formal words and named-entities. However, it suffers from derivative words, where it cannot easily distinguish them from both absorption words and terms of foreign languages. Besides, it also undergoes some foreign named-entities.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9708824 | PMC |
http://dx.doi.org/10.1016/j.heliyon.2022.e11922 | DOI Listing |
Heliyon
November 2022
School of Computing, Telkom University, Bandung, Indonesia.
J Imaging
October 2022
Department of Informatics, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha No.10, Bandung 40132, Indonesia.
Captioning is the process of assembling a description for an image. Previous research on captioning has usually focused on foreground objects. In captioning concepts, there are two main objects for discussion: background object and foreground object.
View Article and Find Full Text PDFAnnu Int Conf IEEE Eng Med Biol Soc
July 2017
Zika virus has caught the worlds attention, and has led people to share their opinions and concerns on social media like Twitter. Using text-based features, extracted with the help of Parts of Speech (POS) taggers and N-gram, a classifier was built to detect Zika related tweets from Twitter. With a simple logistic classifier, the system was successful in detecting Zika related tweets from Twitter with a 92% accuracy.
View Article and Find Full Text PDFNature
March 2016
Max-Planck-Institut für Radioastronomie, Auf dem Hügel 69, 53121 Bonn, Germany.
Cosmic rays are the highest-energy particles found in nature. Measurements of the mass composition of cosmic rays with energies of 10(17)-10(18) electronvolts are essential to understanding whether they have galactic or extragalactic sources. It has also been proposed that the astrophysical neutrino signal comes from accelerators capable of producing cosmic rays of these energies.
View Article and Find Full Text PDFOral Surg Oral Med Oral Pathol Oral Radiol Endod
May 2010
Postgraduate Student, Department of Endodontics, Bauru Dental School, University of São Paulo, Bauru, Brazil.
Objective: In this study, presence of dentin infection in root canals, obturated with 4 techniques submitted to the bacterial leakage test, was evaluated using histologic methods.
Study Design: The canals of palatal roots of 160 molars were instrumented and divided into different groups, according to the obturation technique used (lateral condensation, MicroSeal system, Touch 'n Heat + Ultrafil, and Tagger's hybrid technique) and extent of the remaining obturation material (5 mm and 10 mm). Ten additional roots were used as control samples.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!