Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas
This thesis addresses the problem of variable selection using the random forest method when the underlying model for the response variable is linear. To this end, simulated data sets with di_erent characteristics are con_gured and then, the methodology applied, and the prediction error measured each...
- Autores:
-
Cardona Alzate, Néstor Iván
- Tipo de recurso:
- Work document
- Fecha de publicación:
- 2019
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/75561
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/75561
- Palabra clave:
- Matemáticas::Probabilidades y matemáticas aplicadas
Prediction
Predictor variables
Análisis de regresión
Métodos de simulación
Predictor variables
- Rights
- openAccess
- License
- Atribución-NoComercial-SinDerivadas 4.0 Internacional
Summary: | This thesis addresses the problem of variable selection using the random forest method when the underlying model for the response variable is linear. To this end, simulated data sets with di_erent characteristics are con_gured and then, the methodology applied, and the prediction error measured each time a variable is eliminated. This is done to evaluate the selection algorithm, which leads to identifying that it is e_cient when data sets contain groups of predictor variables with a size less than 8. Also, this is done to evaluate the random forest method, which leads to identifying that the total number of predictor variables is the factor that most strongly impacts its performance. |
---|