ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Yunze Xiao; Yujia Hu; Kenny Tsu Wei Choo; Roy Ka-wei Lee

Back

Conference proceeding

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Yunze Xiao, Yujia Hu, Kenny Tsu Wei Choo and Roy Ka-wei Lee

2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2024, pp.6012-6025

01/01/2024

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Computer Science, Interdisciplinary Applications

Linguistics

Science & Technology

Social Sciences

Technology

Detecting hate speech and offensive language is essential for maintaining a safe and respectful digital environment. This study examines the limitations of state-of-the-art large language models (LLMs) in identifying offensive content within systematically perturbed data, with a focus on Chinese, a language particularly susceptible to such perturbations. We introduce ToxiCloakCN1, an enhanced dataset derived from ToxiCN, augmented with homophonic substitutions and emoji transformations, to test the robustness of LLMs against these cloaking perturbations. Our findings reveal that existing models significantly underperform in detecting offensive content when these perturbations are applied. We provide an in-depth analysis of how different types of offensive content are affected by these perturbations and explore the alignment between human and model explanations of offensiveness. Our work highlights the urgent need for more advanced techniques in offensive language detection to combat the evolving tactics used to evade detection mechanisms.

Metrics

1 Record Views

Details

Title: ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations
Creators - without role: Yunze Xiao - Carnegie Mellon University Qatar
Yujia Hu - Singapore Univ Technol & Design, Singapore, Singapore
Kenny Tsu Wei Choo - Singapore Univ Technol & Design, Singapore, Singapore
Roy Ka-wei Lee - Singapore Univ Technol & Design, Singapore, Singapore
Contributors - without role: Y Al-Onaizan
M Bansal
Y N Chen
Publication Details: 2024 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2024, pp.6012-6025
Publisher: Assoc Computational Linguistics-Acl
Number of pages: 14
Grant note: MOE-T2EP20222-0010 / Ministry of Education, Singapore, under its Academic Research Fund Tier 2
Identifiers: 9912755309846
Academic Unit: ISTD Pillar
Language: English
Resource Type: Conference proceeding

ToxiCloakCN: Evaluating Robustness of Offensive Language Detection in Chinese with Cloaking Perturbations

Abstract

Metrics

Details

Singapore University of Technology and Design Social media