In this work, we describe our efforts in addressing two typical challenges involved in the popular text classification methods when they are applied to text moderation: the representation of multibyte characters and word obfuscations. Specifically, a multihot byte-level scheme is developed to significantly reduce the dimension of one-hot character-level encoding caused by the multiplicity of instance-scarce non-ASCII characters. In addition, we introduce a simple yet effective weighting approach for fusing n-gram features to empower the classical logistic regression. Surprisingly, it outperforms well-tuned representative neural networks greatly. As a continual effort toward text moderation, we endeavor to analyze the current state-of-the-art (SOTA) algorithm bidirectional encoder representations from transformers (BERT), which works well in context understanding but performs poorly on intentional word obfuscations. To resolve this crux, we then develop an enhanced variant and remedy this drawback by integrating byte and character decomposition. It advances the SOTA performance on the largest abusive language datasets as demonstrated by our comprehensive experiments. Our work offers a feasible and effective framework to tackle word obfuscations.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2021.3137045 | DOI Listing |
Acad Emerg Med
January 2025
Department of Emergency Medicine, University of Rochester, Rochester, New York, USA.
Background: Cervical cancer (CC) is preventable. CC screening decreases CC mortality. Emergency department (ED) patients are at disproportionately high risk for nonadherence with CC screening recommendations.
View Article and Find Full Text PDFHealth Serv Res
January 2025
Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA.
Objective: To assess the utility and challenges of using natural language processing (NLP) in electronic health records (EHRs) to ascertain health-related social needs (HRSNs) among older adults.
Study Setting And Design: We extracted HRSN information using the NLP system Clinical Text Analysis and Knowledge Extraction System (cTAKES), combined with Concept Unique Identifiers and Systematized Nomenclature for Medicine codes. We validated cTAKES performance, via manual chart review, on two HRSNs: food insecurity, which was included in the healthcare system's HRSN screening tool, and housing insecurity, which was not.
PLoS One
January 2025
School of Economics and Management, Qingdao Agricultural University, Qingdao, China.
As the pace of enterprise digital transformation accelerates, intellectual capital (IC) has become a core driving force of gaining market competitive advantages and enhancing value creation capabilities. The paper aims to investigate the impact of IC and its components on financial performance of Chinese ecological protection and environmental governance companies during 2018-2021. In addition, the moderating effect of digital transformation between them is examined.
View Article and Find Full Text PDFCochrane Database Syst Rev
January 2025
Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, UK.
Background: People from lower socioeconomic groups are more likely to smoke and less likely to succeed in achieving abstinence, making tobacco smoking a leading driver of health inequalities. Contextual factors affecting subpopulations may moderate the efficacy of individual-level smoking cessation interventions. It is not known whether any intervention performs differently across socioeconomically-diverse populations and contexts.
View Article and Find Full Text PDFMed Oral Patol Oral Cir Bucal
January 2025
15, Trauma Centre, District Hospital Neemuch Madhya Pradesh - 458441, India
Background: The accurate and timely diagnosis of oral potentially malignant lesions (OPMLs) is crucial for effective management and prevention of oral cancer. Recent advancements in artificial intelligence technologies indicates its potential to assist in clinical decision-making. Hence, this study was carried out with the aim to evaluate and compare the diagnostic accuracy of ChatGPT 3.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!