Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas

This thesis addresses the problem of variable selection using the random forest method when the underlying model for the response variable is linear. To this end, simulated data sets with di_erent characteristics are con_gured and then, the methodology applied, and the prediction error measured each...

Full description

Autores:
Cardona Alzate, Néstor Iván
Tipo de recurso:
Work document
Fecha de publicación:
2019
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/75561
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/75561
Palabra clave:
Matemáticas::Probabilidades y matemáticas aplicadas
Prediction
Predictor variables
Análisis de regresión
Métodos de simulación
Predictor variables
Rights
openAccess
License
Atribución-NoComercial-SinDerivadas 4.0 Internacional
Description
Summary:This thesis addresses the problem of variable selection using the random forest method when the underlying model for the response variable is linear. To this end, simulated data sets with di_erent characteristics are con_gured and then, the methodology applied, and the prediction error measured each time a variable is eliminated. This is done to evaluate the selection algorithm, which leads to identifying that it is e_cient when data sets contain groups of predictor variables with a size less than 8. Also, this is done to evaluate the random forest method, which leads to identifying that the total number of predictor variables is the factor that most strongly impacts its performance.