Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers

In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, f...

Full description

Autores:: Marrugo-Tobón, Duván Andres
Martinez-Santos, Juan Carlos
Puertas, Edwin

Tipo de recurso:

Fecha de publicación:: 2023

Institución:: Universidad Tecnológica de Bolívar

Repositorio:: Repositorio Institucional UTB

Idioma:: eng

Description
Summary:	In natural language processing, accurate categorization of tweets, including detecting hate speech, plays a pivotal role in efficient information organization and analysis. This paper presents a Natural Language Contents Evaluation System specifically tailored for multi-class tweet categorization, focusing on hate speech detection. Our system enhances classification accuracy and efficiency by harnessing the power of Transformers, namely BERT and DistilBERT. By leveraging feature extraction techniques, we capture pertinent information from tweets, enabling practical analysis, categorization, and identification of hate speech instances. During training, we also tackle imbalanced corpora by employing techniques to ensure fair representation of different tweet categories, including hate speech. Our system achieves impressive accuracy through extensive training of 95%, showcasing Transformers' effectiveness in comprehending and categorizing tweets, including identifying hate speech. Furthermore, our system maintains a good accuracy during testing of 83%, highlighting the robustness and generalizability of the trained models for hate speech detection. This system contributes to advancing automated tweet categorization, specifically in hate speech detection, providing a reliable and efficient solution for organizing and analyzing diverse tweet datasets.

Natural language content evaluation system for multiclass detection of hate speech in tweets using transformers

Publicaciones similares