MGEL: Multigrained Representation Analysis and Ensemble Learning for Text Moderation.

Fei Tan Changwei Hu Yifan Hu Kevin Yen Zhi Wei Aasish Pappu Serim Park Keqian Li

IEEE Trans Neural Netw Learn Syst

Published: October 2023

In this work, we describe our efforts in addressing two typical challenges involved in the popular text classification methods when they are applied to text moderation: the representation of multibyte characters and word obfuscations. Specifically, a multihot byte-level scheme is developed to significantly reduce the dimension of one-hot character-level encoding caused by the multiplicity of instance-scarce non-ASCII characters. In addition, we introduce a simple yet effective weighting approach for fusing n-gram features to empower the classical logistic regression. Surprisingly, it outperforms well-tuned representative neural networks greatly. As a continual effort toward text moderation, we endeavor to analyze the current state-of-the-art (SOTA) algorithm bidirectional encoder representations from transformers (BERT), which works well in context understanding but performs poorly on intentional word obfuscations. To resolve this crux, we then develop an enhanced variant and remedy this drawback by integrating byte and character decomposition. It advances the SOTA performance on the largest abusive language datasets as demonstrated by our comprehensive experiments. Our work offers a feasible and effective framework to tackle word obfuscations.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TNNLS.2021.3137045	DOI Listing

Publication Analysis

Top Keywords

text moderation

word obfuscations

mgel multigrained

multigrained representation

representation analysis

analysis ensemble

ensemble learning

text

learning text

moderation work

Similar Publications

Low-cost interventions to increase uptake of cervical cancer screening among emergency department patients: Results of a randomized clinical trial.

Acad Emerg Med

January 2025

Department of Emergency Medicine, University of Rochester, Rochester, New York, USA.

David Adler Nancy Wood Kevin Fiscella Karen Mustian Ellen Tourtelot

Background: Cervical cancer (CC) is preventable. CC screening decreases CC mortality. Emergency department (ED) patients are at disproportionately high risk for nonadherence with CC screening recommendations.

View Article and Find Full Text PDF

Similar Publications

Extracting Housing and Food Insecurity Information From Clinical Notes Using cTAKES.

Health Serv Res

January 2025

Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA.

Min Hee Kim Silvia Miramontes Shivani Mehta Gabriel L Schwartz Ye Ji Kim

Objective: To assess the utility and challenges of using natural language processing (NLP) in electronic health records (EHRs) to ascertain health-related social needs (HRSNs) among older adults.

Study Setting And Design: We extracted HRSN information using the NLP system Clinical Text Analysis and Knowledge Extraction System (cTAKES), combined with Concept Unique Identifiers and Systematized Nomenclature for Medicine codes. We validated cTAKES performance, via manual chart review, on two HRSNs: food insecurity, which was included in the healthcare system's HRSN screening tool, and housing insecurity, which was not.

View Article and Find Full Text PDF

Similar Publications

Intellectual capital, digital transformation and firms' financial performance: Evidence from ecological protection and environmental governance industry in China.

PLoS One

January 2025

School of Economics and Management, Qingdao Agricultural University, Qingdao, China.

Jian Yin Jian Xu

As the pace of enterprise digital transformation accelerates, intellectual capital (IC) has become a core driving force of gaining market competitive advantages and enhancing value creation capabilities. The paper aims to investigate the impact of IC and its components on financial performance of Chinese ecological protection and environmental governance companies during 2018-2021. In addition, the moderating effect of digital transformation between them is examined.

View Article and Find Full Text PDF

Similar Publications

Differences in the effectiveness of individual-level smoking cessation interventions by socioeconomic status.

Cochrane Database Syst Rev

January 2025

Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK.

Annika Theodoulou Thomas R Fanshawe Eleanor Leavens Effie Theodoulou Angela Difeng Wu

Background: People from lower socioeconomic groups are more likely to smoke and less likely to succeed in achieving abstinence, making tobacco smoking a leading driver of health inequalities. Contextual factors affecting subpopulations may moderate the efficacy of individual-level smoking cessation interventions. It is not known whether any intervention performs differently across socioeconomically-diverse populations and contexts.

View Article and Find Full Text PDF

Similar Publications

Accuracy of ChatGPT 3.5, 4.0, 4o and Gemini in diagnosing oral potentially malignant lesions based on clinical case reports and image recognition.

Med Oral Patol Oral Cir Bucal

January 2025

15, Trauma Centre, District Hospital Neemuch Madhya Pradesh - 458441, India

P Pradhan

Background: The accurate and timely diagnosis of oral potentially malignant lesions (OPMLs) is crucial for effective management and prevention of oral cancer. Recent advancements in artificial intelligence technologies indicates its potential to assist in clinical decision-making. Hence, this study was carried out with the aim to evaluate and compare the diagnostic accuracy of ChatGPT 3.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!