Natural language contents evaluation system for multi-class news categorization using machine learning and transformers

The exponential growth of digital documents has come with rapid progress in text classification techniques in recent years. This paper provides text classification models, which analyze various steps of news classification, where some algorithmic approaches for machine learning, such as Logistic Reg...

Full description

Autores:
Marrugo, Duván A
Martinez-Santos, Juan Carlos
Puertas, Edwin
Tipo de recurso:
Fecha de publicación:
2023
Institución:
Universidad Tecnológica de Bolívar
Repositorio:
Repositorio Institucional UTB
Idioma:
eng
OAI Identifier:
oai:repositorio.utb.edu.co:20.500.12585/12578
Acceso en línea:
https://hdl.handle.net/20.500.12585/12578
Palabra clave:
Text Classification
Automatic Classification
News Classification
Transformer
Machine Learning
Deep Learning
LEMB
Rights
openAccess
License
http://purl.org/coar/access_right/c_abf2
Description
Summary:The exponential growth of digital documents has come with rapid progress in text classification techniques in recent years. This paper provides text classification models, which analyze various steps of news classification, where some algorithmic approaches for machine learning, such as Logistic Regression, Support Vector Machine, and Random Forest, are implemented. In turn, the uses of Transformers as classification models for the solution of the same problem, proposing BERT and DistilBERT as possible solutions to compare for the automatic classification of news containing articles belonging to four categories (World, Sports, Business, and Science/Technology). We obtained the highest accuracy on the machine learning side, with 88% using Support Vector Machine with Word2Vec. However, using Transformer DistilBERT, we got an efficient model in terms of performance and 91.7% accuracy for classifying news.