Publications by authors named "Sabrina Tiun"

Background: The Automatic Essay Score (AES) prediction system is essential in education applications. The AES system uses various textural and grammatical features to investigate the exact score value for AES. The derived features are processed by various linear regressions and classifiers that require the learning pattern to improve the overall score.

View Article and Find Full Text PDF

Clustering texts together is an essential task in data mining and information retrieval, whose aim is to group unlabeled texts into meaningful clusters that facilitate extracting and understanding useful information from large volumes of textual data. However, clustering short texts (STC) is complex because they typically contain sparse, ambiguous, noisy, and lacking information. One of the challenges for STC is finding a proper representation for short text documents to generate cohesive clusters.

View Article and Find Full Text PDF

As social media booms, abusive online practices such as hate speech have unfortunately increased as well. As letters are often repeated in words used to construct social media messages, these types of words should be eliminated or reduced in number to enhance the efficacy of hate speech detection. Although multiple models have attempted to normalize out-of-vocabulary (OOV) words with repeated letters, they often fail to determine whether the in-vocabulary (IV) replacement words are correct or incorrect.

View Article and Find Full Text PDF

The use of machine learning (ML) and data mining algorithms in the diagnosis of breast cancer (BC) has recently received a lot of attention. The majority of these efforts, however, still require improvement since either they were not statistically evaluated or they were evaluated using insufficient assessment metrics, or both. One of the most recent and effective ML algorithms, fast learning network (FLN), may be seen as a reputable and efficient approach for classifying data; however, it has not been applied to the problem of BC diagnosis.

View Article and Find Full Text PDF

COVID-19 (coronavirus disease 2019) is an ongoing global pandemic caused by severe acute respiratory syndrome coronavirus 2. Recently, it has been demonstrated that the voice data of the respiratory system (i.e.

View Article and Find Full Text PDF

Many works have employed Machine Learning (ML) techniques in the detection of Diabetic Retinopathy (DR), a disease that affects the human eye. However, the accuracy of most DR detection methods still need improvement. Gray Wolf Optimization-Extreme Learning Machine (GWO-ELM) is one of the most popular ML algorithms, and can be considered as an accurate algorithm in the process of classification, but has not been used in solving DR detection.

View Article and Find Full Text PDF

Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature.

View Article and Find Full Text PDF

The coronavirus disease (COVID-19), is an ongoing global pandemic caused by severe acute respiratory syndrome. Chest Computed Tomography (CT) is an effective method for detecting lung illnesses, including COVID-19. However, the CT scan is expensive and time-consuming.

View Article and Find Full Text PDF

Word sense disambiguation (WSD) is the process of identifying an appropriate sense for an ambiguous word. With the complexity of human languages in which a single word could yield different meanings, WSD has been utilized by several domains of interests such as search engines and machine translations. The literature shows a vast number of techniques used for the process of WSD.

View Article and Find Full Text PDF

Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework.

View Article and Find Full Text PDF

Word Sense Disambiguation (WSD) is the task of determining which sense of an ambiguous word (word with multiple meanings) is chosen in a particular use of that word, by considering its context. A sentence is considered ambiguous if it contains ambiguous word(s). Practically, any sentence that has been classified as ambiguous usually has multiple interpretations, but just one of them presents the correct interpretation.

View Article and Find Full Text PDF