Two-Level Classifier for Detecting Categories of Offensive Language

Segun Taofeek Aroyehun*, Alexander Gelbukh, Grigori Sidorov

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

We explore the task of offensive content classification on the HASOC 2021 shared task dataset. Our approach is based on two-level classification scheme (corresponding to Subtasks 1A and 1B respectively). The first level is a binary classification (offensive or not). The second level further classifies only the offensive instances. The classifier at each level is a fine-tuned transformer model. Our model on the English dataset achieves an overall best macro F1 score of 0.831 and 0.666 on Subtasks 1A and 1B, respectively. The model performance on Hindi and Marathi is competitive: macro F1 score of 0.778 for Hindi Subtask 1A and 0.553 for Hindi Subtask 1B (fourth place on the leaderboard), and a macro F1 score of 0.847 for Marathi Subtask 1A (13th on the leaderboard).

Original languageEnglish
Title of host publicationRecent Developments and the New Directions of Research, Foundations, and Applications
PublisherSpringer Science and Business Media Deutschland GmbH
Pages225-232
Number of pages8
ISBN (Electronic)978-3-031-23476-7
ISBN (Print)978-3-031-23475-0
DOIs
Publication statusPublished - 2023

Publication series

NameStudies in Fuzziness and Soft Computing
Volume423
ISSN (Print)1434-9922
ISSN (Electronic)1860-0808

Keywords

  • Deep learning
  • Multilingual
  • Offensive content identification
  • Sentiment analysis
  • Social media
  • Text classification

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computational Mathematics

Cite this