Classification and features selection method for obesity level prediction
Obesity has become one of the world’s largest health issues, rich and poor countries, without exception, have each year larger populations with this condition. Obesity and overweight are defined as abnormal or excessive fat accumulation that may impair health according to the World Health Organizati...
- Autores:
-
Molina Estren, Diego
De la Hoz Manotas, Alexis Kevin
Mendoza Palechor, Fabio
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2021
- Institución:
- Corporación Universidad de la Costa
- Repositorio:
- REDICUC - Repositorio CUC
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.cuc.edu.co:11323/8417
- Acceso en línea:
- https://hdl.handle.net/11323/8417
https://repositorio.cuc.edu.co/
- Palabra clave:
- Data mining
Dataset
Obesity
Decision Trees
Support Vector Machines
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
Summary: | Obesity has become one of the world’s largest health issues, rich and poor countries, without exception, have each year larger populations with this condition. Obesity and overweight are defined as abnormal or excessive fat accumulation that may impair health according to the World Health Organization (WHO) and has nearly tripled since 1975. Data Mining and their techniques have become a strong scientific field to analyze huge data sources and to provide new information about patterns and behaviors from the population. This study uses data mining techniques to build a model for obesity prediction, using a dataset based on a survey for college students in several countries. After cleaning and transformation of the data, a set of classification methods was implemented (Logistic Model Tree - LMT, RandomForest - RF, Multi-Layer Perceptron - MLP and Support Vector Machines - SVM), and the feature selection methods InfoGain, GainRatio, Chi-Square and Relief, finally, crossed validation was performed for the training and testing processes. The data showed than LMT had the best performance in precision, obtaining 96.65%, compared to RandomForest (95.62%), MLP (94.41%) and SMO (83.89%), so this study shows that LMT it can be used with confidence to analyze obesity and similar data. |
---|