Diseño de modelo para predicción de precio y avalúo de inmuebles en la ciudad de Barranquilla

TruData is a company that provides a data analytics and business intelligence service to various companies in diverse industries in Colombia. They acquired a new costumer whose business is buying and selling real estate properties at a fair price in a short period of time. The company was given the...

Full description

Autores:
Álvarez, Ricardo
López, Alejandro
Piñeres, Augusto
Selman, Isabella
Tipo de recurso:
Fecha de publicación:
2022
Institución:
Universidad del Norte
Repositorio:
Repositorio Uninorte
Idioma:
spa
OAI Identifier:
oai:manglar.uninorte.edu.co:10584/11251
Acceso en línea:
http://hdl.handle.net/10584/11251
Palabra clave:
Crispdm
Modelo de predicción
Avalúo
Barranquilla
Price prediction
Real estate appraisal
Rights
License
Universidad del Norte
Description
Summary:TruData is a company that provides a data analytics and business intelligence service to various companies in diverse industries in Colombia. They acquired a new costumer whose business is buying and selling real estate properties at a fair price in a short period of time. The company was given the task of analyzing and interpreting the market behavior and tendencies to be able to penetrate its walls by providing satisfaction to its clients, both real estate sellers and buyers. The main objective of this project is to apply analytics methods to a database provided by TruData with the purpose of designing a robust model that allows the prediction of a real estate market price with a minimal error indicator. The methodology applied was a six-phase CRISP DM to correctly analyze the market context and the data supplied with the goal of correctly modeling the data entered. This process presented many challenges as it required an exhaustive investigation about the types of data analytics models to define which were the ones that fit the data best, considering the investigation done about successful cases of data modeling in other businesses and industries. Based on the results of this process, the models chosen to analyze the data were: GLM BOOST, Random Forest, Regresión Lineal Múltiple, XGBOOST y RIDGE Regression. After obtaining the results thrown by each of the chosen modules, a comparison of their performance was done, considering their RSME and MAPE. RMSE is interpreted as the deviation of the result obtained and it’s presented in the same unit as the response variable (COP). MAPE represents the average of the absolute error in percentage form. Both indicate a better result as they get smaller in quantity. The model with the best result was the one developed with Random Forest methodology, having a approximate RMSE of 84.000.000 COP. This model will allow the prediction of a property value with the manual entry of each property's independent variables.