Predictive models assessment based on CRISP-DM methodology for students performance in Colombia - Saber 11 Test

The purpose of this paper is to evaluate several machine learning models under the CRISP-DM methodology in order to determine, through its metrics, the best model for predicting the performance of high school students in the Colombian Caribbean region in the Saber 11º test, while proposing a new met...

Full description

Autores:
Acosta-Solano, Jairo
Lancheros Cuesta, Diana Janeth
Umaña Ibáñez, Samir F.
Coronado-Hernandez, Jairo R.
Tipo de recurso:
Article of journal
Fecha de publicación:
2022
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/9221
Acceso en línea:
https://hdl.handle.net/11323/9221
https://doi.org/10.1016/j.procs.2021.12.278
https://repositorio.cuc.edu.co/
Palabra clave:
CRISP-DM methodology
Education
Learning models
National education system
Predictive models
Rights
openAccess
License
© 2021 The Authors. Published by Elsevier B.V
Description
Summary:The purpose of this paper is to evaluate several machine learning models under the CRISP-DM methodology in order to determine, through its metrics, the best model for predicting the performance of high school students in the Colombian Caribbean region in the Saber 11º test, while proposing a new methodology for evaluating the results of the test by regions in order to take into account the socioeconomic particularities of each one of them. The CRISP-DM methodology is taken as a basis due to its maturity, this methodology allows the extraction of business and data knowledge, offers a guide for data preparation, modeling and validation of the models; it is expected that the proposed methodology will be implemented by the Colombian Institute for the Promotion of Higher Education (ICFES), departmental education secretariats and educational institutions. A variety of techniques and tools were used to develop ETL processes to obtain a data set with the most relevant attributes, in order to evaluate four machine learning models developed with the J48 (C4.5), LMT, PART and Multilayer Perceptron algorithms; obtaining that the best data set and the best learning model is obtained using the InfoGain attribute selection method and the LMT decision tree algorithm, respectively. Therefore, this project will facilitate the actors of the National Education System to make decisions for the benefit of students and the quality of education in the country, especially in the Caribbean region.