Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers

In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, f...

Full description

Autores:
Marrugo-Tobón, Duván Andres
Martinez-Santos, Juan Carlos
Puertas, Edwin
Tipo de recurso:
Fecha de publicación:
2023
Institución:
Universidad Tecnológica de Bolívar
Repositorio:
Repositorio Institucional UTB
Idioma:
eng
OAI Identifier:
oai:repositorio.utb.edu.co:20.500.12585/12581
Acceso en línea:
https://hdl.handle.net/20.500.12585/12581
https://ceur-ws.org/Vol-3496/homomex-paper4.pdf
Palabra clave:
BERT
DistilBERT
Feature extraction
Hate speech detection
Natural language processing
Transformers
Tweet categorization
LEMB
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description
Summary:In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, focusing on hate speech detection. Our system enhances classification accuracy and efficiency by harnessing the power of Transformers, namely BERT and DistilBERT. By leveraging feature extraction techniques, we capture pertinent information from tweets, enabling practical analysis, categorization, and identification of hate speech instances. During training, we also tackle imbalanced corpora by employing techniques to ensure fair representation of different tweet categories, including hate speech. Our system achieves impressive accuracy through extensive training of 95%, showcasing Transformers' effectiveness in comprehending and categorizing tweets, including identifying hate speech. Furthermore, our system maintains a good accuracy during testing of 83%, highlighting the robustness and generalizability of the trained models for hate speech detection. This system contributes to advancing automated tweet categorization, specifically in hate speech detection, providing a reliable and efficient solution for organizing and analyzing diverse tweet datasets.