Análisis de clúster automático
ilustraciones, gráficas, tablas
- Autores:
-
Correa Henao, Marisol
- Tipo de recurso:
- Fecha de publicación:
- 2021
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/80784
- Palabra clave:
- 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores
Cluster analysis
Análisis clúster
Análisis
Clúster,
Librería
Aprendizaje de máquinas automático
Software
Python
Analysis
Cluster
Python
Library
Automatic machine learning
- Rights
- openAccess
- License
- Atribución-CompartirIgual 4.0 Internacional
id |
UNACIONAL2_1865ee8239627b093ca1103406512643 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/80784 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Análisis de clúster automático |
dc.title.translated.eng.fl_str_mv |
Automatic cluster analysis |
title |
Análisis de clúster automático |
spellingShingle |
Análisis de clúster automático 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores Cluster analysis Análisis clúster Análisis Clúster, Librería Aprendizaje de máquinas automático Software Python Analysis Cluster Python Library Automatic machine learning |
title_short |
Análisis de clúster automático |
title_full |
Análisis de clúster automático |
title_fullStr |
Análisis de clúster automático |
title_full_unstemmed |
Análisis de clúster automático |
title_sort |
Análisis de clúster automático |
dc.creator.fl_str_mv |
Correa Henao, Marisol |
dc.contributor.advisor.none.fl_str_mv |
Velasquez Henao, Juan David |
dc.contributor.author.none.fl_str_mv |
Correa Henao, Marisol |
dc.subject.ddc.spa.fl_str_mv |
000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores |
topic |
000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores Cluster analysis Análisis clúster Análisis Clúster, Librería Aprendizaje de máquinas automático Software Python Analysis Cluster Python Library Automatic machine learning |
dc.subject.lemb.none.fl_str_mv |
Cluster analysis Análisis clúster |
dc.subject.proposal.spa.fl_str_mv |
Análisis Clúster, Librería Aprendizaje de máquinas automático |
dc.subject.proposal.eng.fl_str_mv |
Software Python Analysis Cluster Python Library Automatic machine learning |
description |
ilustraciones, gráficas, tablas |
publishDate |
2021 |
dc.date.accessioned.none.fl_str_mv |
2021-12-15T15:30:20Z |
dc.date.available.none.fl_str_mv |
2021-12-15T15:30:20Z |
dc.date.issued.none.fl_str_mv |
2021-12-08 |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/80784 |
dc.identifier.instname.spa.fl_str_mv |
Universidad Nacional de Colombia |
dc.identifier.reponame.spa.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
dc.identifier.repourl.spa.fl_str_mv |
https://repositorio.unal.edu.co/ |
url |
https://repositorio.unal.edu.co/handle/unal/80784 https://repositorio.unal.edu.co/ |
identifier_str_mv |
Universidad Nacional de Colombia Repositorio Institucional Universidad Nacional de Colombia |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.references.spa.fl_str_mv |
Aguilar, L. J. (2016). Big Data, Análisis de grandes volúmenes de datos en organizaciones. Alfaomega Grupo Editor. Aldenderfer, M. S., & Blashfield, R. K. (1984). A review of clustering methods. Cluster analysis, 33-61. Anderberg, M. R. (1973). Cluster Analysis for applications. Academic Press. New York and London. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60. Ashenden,A., Ward-Dutton, N., & Wentworth, C., (2016). La nueva tendencia de automatización: Machine Learning y más. MWD Advisors. Disponible en: https://www.ibm.com/downloads/cas/M1PG1J23. Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006). Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208-221. Chojnacki, A., Dai, C., Farahi, A., Shi, G., Webb, J., Zhang, D.T., Abernethy, J., Schwartz, E., (2017). A Data Science Approach to Understanding Residential Water Contamination in Flint, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17. ACM, New York, NY, USA, pp. 1407– 1416. https://doi.org/10.1145/3097983.3098078 Aliguliyev, R. M. (2009). Performance evaluation of density-based clustering methods. Information Sciences, 179(20), 3583-3602. Aluja, T. (2001). La minería de datos, entre la estadística y la inteligencia artificial. Qüestió: quaderns d'estadística i investigació operativa, 25(3), 479-498. Chou, Y. L., & Armer, V. A. (1977). Análisis estadístico (No. 04; RMD, HA29 C4 1977.). Interamericana. Cleveland, W.S., 2001. Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics. Int. Stat. Rev. 69, 21–26. https://doi.org/10.1111/j.1751- 5823.2001.tb00477.x Cluster (Mahout Map-Reduce 0.13.0 API). (2017, April 14). Apache.org. https://mahout.apache.org/docs/0.13.0/api/docs/mahoutmr/org/apache/mahout/clustering/Cluster.html Clustering |KNIME. (2021). KNIME. https://www.knime.com/nodeguide/analytics/clustering Correa, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisol Dhillon, I. S., Guan, Y., & Kulis, B. (2004, August). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 551-556). ACM. Correa, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolCorreaHenao/tree/main/docs Dane, A. D., & Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669. 2009 a simple Díaz, M., León, Á., Alvin, H., & Díaz Mora, M. E. (2016). Introducción al análisis estadístico multivariado aplicado. Experiencia y casos en el Caribe colombiano. Universidad del Norte. Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern recognition, 11(4), 235-254. Eluri, V. R., Ramesh, M., Al-Jabri, A. S. M., & Jane, M. (2016, March). A comparative study of various clustering techniques on big data sets using Apache Mahout. In 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (pp. 1-4). IEEE. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231). Fernández, S. F., Sánchez, J. M. C., Córdoba, A., & Largo, A. C. (2002). Estadística descriptiva. Esic Editorial. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970). Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155-166. Gómez-Skarmeta, A. F., Delgado, M., & Vila, M. A. (1999). About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy sets and systems, 106(2), 179- 188. Hazen, B.T., Boone, C.A., Ezell, J.D., Jones-Farmer, L.A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154, 72– 80. https://doi.org/10.1016/j.ijpe.2014.04.018. Hierarchical Clustering — Orange Visual Programming 3 documentation. (2021). Readthedocs.io. https://orange3.readthedocs.io/projects/orange-visualprogramming/en/latest/widgets/unsupervised/hierarchicalclustering.html Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304. ipywidgets — Jupyter Widgets 7.6.5 documentation. (2021). Readthedocs.io. https://ipywidgets.readthedocs.io/en/stable/ Ji, J., Bai, T., Zhou, C., Ma, C., & Wang, Z. (2013). An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing, 120, 590-596 SAS Institute. (2012). SAS/OR 9.3 User's Guide: Mathematical Programming Examples. SAS institute. Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to clúster analysis. New York: Wiley 2009 a simple. López, C. P. (2007). Minería de datos: técnicas y herramientas. Editorial Paraninfo. Lückeheide, S., Velásquez, J. D., & Cerda, L. (2007). Segmentación de los contribuyentes que declaran iva aplicando herramientas de clustering. Revista de Ingeniería de Sistemas, 21, 87-110. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press. 2009 a simple. Maheswaran, G., Jayarajan, P., Jose, J., & Joseph, J. (2013). K Means Clustering Algorithms: A Comparitive Study. Matplotlib: Python plotting —Matplotlib 3.4.3 documentation. (2012). Matplotlib.org. https://matplotlib.org/ Meilă, M., & Heckerman, D. (2001). An experimental comparison of model-based clustering methods. Machine learning, 42(1), 9-29. Morán, L. L., & Alonso, J. H. (2009). Estadística descriptiva. Ediciones Académicas. Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9), 1-18. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems (pp. 849-856). NumPy. (2021). Numpy.org. https://numpy.org/ Pandas - Python Data Analysis Library. (2021). Pydata.org. https://pandas.pydata.org/ Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), 3336-3341. Peña, D. (2002). Análisis de datos multivariantes (Vol. 24). Madrid: McGraw-hill. Provost, F., Fawcett, T., (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc. RapidMiner GmbH. (2021). k-Means - RapidMiner Documentation. Rapidminer.com. https://docs.rapidminer.com/latest/studio/operators/modeling/segmentation/k_means.html Reynolds, A. P., Richards, G., & Rayward-Smith, V. J. (2004, August). The application of k-medoids and pam to the clustering of rules. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 173-178). Springer, Berlin, Heidelberg. Rodrigo, J.A. (2020). Clustering con Python. Disponible en https://www.cienciadedatos.net/documentos/py20-clustering-con-python.html [10-06- 2021]. Ross, S. M. (2007). Introducción a la estadística. Reverté. Sangüesa Solé, Ramón (coord.) (2000). Data mining: una introducción. Barcelona: Universitat Oberta de Catalunya. Santana, Ó. F. (1991). El análisis de clúster: aplicación, interpretación y validación. Papers: revista de sociologia, (37), 65-76. Scikit-learn, (S.F.). Scikit-learn: Clustering. Disponible en: https://scikitlearn.org/stable/modules/clustering.html. SAS/STAT Cluster Analysis Procedures. (2018, November 20). Sas.com. https://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.html Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., & Shao, L. (2016). Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE Transactions on Image Processing, 25(12), 5933-5942. Sys — Parámetros y funciones específicos del sistema — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3.10/library/sys.html The Jupyter Notebook — Jupyter Notebook 6.4.5 documentation. (2021). Readthedocs.io. https://jupyter-notebook.readthedocs.io/en/stable/ time — Time access and conversions — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/time.html Tutorial de Python — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3/tutorial/ Uriel, E., & Aldás, J. (2005). Análisis multivariante aplicado. 1ª. Edición. Thomson. Madrid. Van der Aalst, W.M., 2016. Process mining: data science in action. Springer. Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416. Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096. warnings — Warning control — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/warnings.html Yellowbrick: Machine Learning Visualization — Yellowbrick v1.3.post1 documentation. (2021). Scikit-Yb.org. https://www.scikit-yb.org/en/latest/ Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Advances in neural information processing systems (pp. 1601-1608). Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 25(2), 103-114. |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-CompartirIgual 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-CompartirIgual 4.0 Internacional http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.spa.fl_str_mv |
xi, 63 páginas |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Universidad Nacional de Colombia |
dc.publisher.program.spa.fl_str_mv |
Medellín - Minas - Maestría en Ingeniería - Analítica |
dc.publisher.department.spa.fl_str_mv |
Departamento de la Computación y la Decisión |
dc.publisher.faculty.spa.fl_str_mv |
Facultad de Minas |
dc.publisher.place.spa.fl_str_mv |
Medellín, Colombia |
dc.publisher.branch.spa.fl_str_mv |
Universidad Nacional de Colombia - Sede Medellín |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/80784/1/license.txt https://repositorio.unal.edu.co/bitstream/unal/80784/2/1017230592.2021.pdf https://repositorio.unal.edu.co/bitstream/unal/80784/3/1017230592.2021.pdf.jpg |
bitstream.checksum.fl_str_mv |
8153f7789df02f0a4c9e079953658ab2 d2937169d9b914c1e0214d68ae913437 1033344f1cd99e59a19da0df9ff4975b |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089317201477632 |
spelling |
Atribución-CompartirIgual 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Velasquez Henao, Juan Davide6aa296be44d6211d05a4feabc532690Correa Henao, Marisol0ccc09973fb79f8787f2005bb06504732021-12-15T15:30:20Z2021-12-15T15:30:20Z2021-12-08https://repositorio.unal.edu.co/handle/unal/80784Universidad Nacional de ColombiaRepositorio Institucional Universidad Nacional de Colombiahttps://repositorio.unal.edu.co/ilustraciones, gráficas, tablasEn este documento se desarrolla el proceso de software de análisis de clúster automático, aunque en la actualidad, existen varias librerías que permiten realizar análisis de clúster, se busca automatizar el proceso y lograr diferentes opciones centralizadas en un mismo paquete; facilitando el análisis y la parametrización de los modelos. Para su elaboración, se utilizaron las librerías ya existentes en Python, tomando como base lo que se tiene en diferentes herramientas y software estadístico o de análisis de datos, de manera que se puedan usar tanto por una persona con conocimientos básicos como por una persona con conocimientos profundos que quiera parametrizar sus análisis. Los resultados de este trabajo muestran que es posible facilitar los procesos de agrupamiento y su respectivo análisis de datos a través de los algoritmos actuales, guiando al usuario de manera simple, gráfica, intuitiva en todo el proceso, llevando a concluir que los resultados del análisis de clúster se ve sujeto a la subjetividad o a los conocimientos del usuario sin embargo esta subjetividad es posible reducirla a través de estrategias, técnicas, análisis y el buen uso de las herramientas existentes. (Texto tomado de la fuente)In this document the automatic cluster analysis software process is developed, although at present, there are several libraries that allow cluster analysis to be carried out. The aim is to automate the process and achieve different centralized options in the same package, facilitating the analysis and parameterization of the models. For its preparation, existing libraries in python were used, taking as a basis what is available in statistical tools and software or data analysis, so that they can be used both by a person with basic knowledge and by a person with knowledge, that you want to parameterize your analysis. The results of this process show that it is possible to facilitate the grouping results and their respective data analysis through current algorithms, guiding the user in a simple, graphical, intuitive way throughout the process, leading to the conclusion that the results of the analysis Clustering is subject to subjectivity or user knowledge, however this subjectivity can be reduced through strategies, techniques, analysis and the proper use of existing tools.MaestríaMagíster en Ingeniería - AnalíticaAnálisis de clústerDocumento con detalle de funcionamiento de softwareÁrea Curricular de Ingeniería de Sistemas e Informáticaxi, 63 páginasapplication/pdfspaUniversidad Nacional de ColombiaMedellín - Minas - Maestría en Ingeniería - AnalíticaDepartamento de la Computación y la DecisiónFacultad de MinasMedellín, ColombiaUniversidad Nacional de Colombia - Sede Medellín000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadoresCluster analysisAnálisis clústerAnálisisClúster,LibreríaAprendizaje de máquinas automáticoSoftwarePythonAnalysisClusterPythonLibraryAutomatic machine learningAnálisis de clúster automáticoAutomatic cluster analysisTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesisinfo:eu-repo/semantics/acceptedVersionTexthttp://purl.org/redcol/resource_type/TMAguilar, L. J. (2016). Big Data, Análisis de grandes volúmenes de datos en organizaciones. Alfaomega Grupo Editor.Aldenderfer, M. S., & Blashfield, R. K. (1984). A review of clustering methods. Cluster analysis, 33-61.Anderberg, M. R. (1973). Cluster Analysis for applications. Academic Press. New York and London.Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.Ashenden,A., Ward-Dutton, N., & Wentworth, C., (2016). La nueva tendencia de automatización: Machine Learning y más. MWD Advisors. Disponible en: https://www.ibm.com/downloads/cas/M1PG1J23.Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208-221.Chojnacki, A., Dai, C., Farahi, A., Shi, G., Webb, J., Zhang, D.T., Abernethy, J., Schwartz, E., (2017). A Data Science Approach to Understanding Residential Water Contamination in Flint, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17. ACM, New York, NY, USA, pp. 1407– 1416. https://doi.org/10.1145/3097983.3098078Aliguliyev, R. M. (2009). Performance evaluation of density-based clustering methods. Information Sciences, 179(20), 3583-3602.Aluja, T. (2001). La minería de datos, entre la estadística y la inteligencia artificial. Qüestió: quaderns d'estadística i investigació operativa, 25(3), 479-498.Chou, Y. L., & Armer, V. A. (1977). Análisis estadístico (No. 04; RMD, HA29 C4 1977.). Interamericana.Cleveland, W.S., 2001. Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics. Int. Stat. Rev. 69, 21–26. https://doi.org/10.1111/j.1751- 5823.2001.tb00477.xCluster (Mahout Map-Reduce 0.13.0 API). (2017, April 14). Apache.org. https://mahout.apache.org/docs/0.13.0/api/docs/mahoutmr/org/apache/mahout/clustering/Cluster.htmlClustering |KNIME. (2021). KNIME. https://www.knime.com/nodeguide/analytics/clusteringCorrea, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolDhillon, I. S., Guan, Y., & Kulis, B. (2004, August). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 551-556). ACM.Correa, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolCorreaHenao/tree/main/docsDane, A. D., & Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669. 2009 a simpleDíaz, M., León, Á., Alvin, H., & Díaz Mora, M. E. (2016). Introducción al análisis estadístico multivariado aplicado. Experiencia y casos en el Caribe colombiano. Universidad del Norte.Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern recognition, 11(4), 235-254.Eluri, V. R., Ramesh, M., Al-Jabri, A. S. M., & Jane, M. (2016, March). A comparative study of various clustering techniques on big data sets using Apache Mahout. In 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (pp. 1-4). IEEE.Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).Fernández, S. F., Sánchez, J. M. C., Córdoba, A., & Largo, A. C. (2002). Estadística descriptiva. Esic Editorial.Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155-166.Gómez-Skarmeta, A. F., Delgado, M., & Vila, M. A. (1999). About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy sets and systems, 106(2), 179- 188.Hazen, B.T., Boone, C.A., Ezell, J.D., Jones-Farmer, L.A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154, 72– 80. https://doi.org/10.1016/j.ijpe.2014.04.018.Hierarchical Clustering — Orange Visual Programming 3 documentation. (2021). Readthedocs.io. https://orange3.readthedocs.io/projects/orange-visualprogramming/en/latest/widgets/unsupervised/hierarchicalclustering.htmlHuang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304.ipywidgets — Jupyter Widgets 7.6.5 documentation. (2021). Readthedocs.io. https://ipywidgets.readthedocs.io/en/stable/Ji, J., Bai, T., Zhou, C., Ma, C., & Wang, Z. (2013). An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing, 120, 590-596SAS Institute. (2012). SAS/OR 9.3 User's Guide: Mathematical Programming Examples. SAS institute. Kaufman, L., & Rousseeuw, P. J. (1990).Finding groups in data: An introduction to clúster analysis. New York: Wiley 2009 a simple.López, C. P. (2007). Minería de datos: técnicas y herramientas. Editorial Paraninfo.Lückeheide, S., Velásquez, J. D., & Cerda, L. (2007). Segmentación de los contribuyentes que declaran iva aplicando herramientas de clustering. Revista de Ingeniería de Sistemas, 21, 87-110.MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press. 2009 a simple.Maheswaran, G., Jayarajan, P., Jose, J., & Joseph, J. (2013). K Means Clustering Algorithms: A Comparitive Study.Matplotlib: Python plotting —Matplotlib 3.4.3 documentation. (2012). Matplotlib.org. https://matplotlib.org/Meilă, M., & Heckerman, D. (2001). An experimental comparison of model-based clustering methods. Machine learning, 42(1), 9-29.Morán, L. L., & Alonso, J. H. (2009). Estadística descriptiva. Ediciones Académicas.Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9), 1-18.Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems (pp. 849-856).NumPy. (2021). Numpy.org. https://numpy.org/Pandas - Python Data Analysis Library. (2021). Pydata.org. https://pandas.pydata.org/Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), 3336-3341.Peña, D. (2002). Análisis de datos multivariantes (Vol. 24). Madrid: McGraw-hill.Provost, F., Fawcett, T., (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.RapidMiner GmbH. (2021). k-Means - RapidMiner Documentation. Rapidminer.com. https://docs.rapidminer.com/latest/studio/operators/modeling/segmentation/k_means.htmlReynolds, A. P., Richards, G., & Rayward-Smith, V. J. (2004, August). The application of k-medoids and pam to the clustering of rules. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 173-178).Springer, Berlin, Heidelberg. Rodrigo, J.A. (2020). Clustering con Python. Disponible en https://www.cienciadedatos.net/documentos/py20-clustering-con-python.html [10-06- 2021].Ross, S. M. (2007). Introducción a la estadística. Reverté.Sangüesa Solé, Ramón (coord.) (2000). Data mining: una introducción. Barcelona: Universitat Oberta de Catalunya.Santana, Ó. F. (1991). El análisis de clúster: aplicación, interpretación y validación. Papers: revista de sociologia, (37), 65-76. Scikit-learn, (S.F.).Scikit-learn: Clustering. Disponible en: https://scikitlearn.org/stable/modules/clustering.html.SAS/STAT Cluster Analysis Procedures. (2018, November 20). Sas.com. https://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.htmlShen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., & Shao, L. (2016). Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEETransactions on Image Processing, 25(12), 5933-5942.Sys — Parámetros y funciones específicos del sistema — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3.10/library/sys.htmlThe Jupyter Notebook — Jupyter Notebook 6.4.5 documentation. (2021). Readthedocs.io. https://jupyter-notebook.readthedocs.io/en/stable/ time —Time access and conversions — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/time.htmlTutorial de Python — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3/tutorial/Uriel, E., & Aldás, J. (2005). Análisis multivariante aplicado. 1ª. Edición. Thomson. Madrid.Van der Aalst, W.M., 2016. Process mining: data science in action. Springer.Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416.Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096. warnings —Warning control — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/warnings.htmlYellowbrick: Machine Learning Visualization — Yellowbrick v1.3.post1 documentation. (2021). Scikit-Yb.org. https://www.scikit-yb.org/en/latest/Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Advances in neural information processing systems (pp. 1601-1608).Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 25(2), 103-114.InvestigadoresPúblico generalLICENSElicense.txtlicense.txttext/plain; charset=utf-84074https://repositorio.unal.edu.co/bitstream/unal/80784/1/license.txt8153f7789df02f0a4c9e079953658ab2MD51ORIGINAL1017230592.2021.pdf1017230592.2021.pdfTesis de Maestría en Ingeniería - Analíticaapplication/pdf2093397https://repositorio.unal.edu.co/bitstream/unal/80784/2/1017230592.2021.pdfd2937169d9b914c1e0214d68ae913437MD52THUMBNAIL1017230592.2021.pdf.jpg1017230592.2021.pdf.jpgGenerated Thumbnailimage/jpeg3969https://repositorio.unal.edu.co/bitstream/unal/80784/3/1017230592.2021.pdf.jpg1033344f1cd99e59a19da0df9ff4975bMD53unal/80784oai:repositorio.unal.edu.co:unal/807842024-08-02 23:10:25.315Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.coUExBTlRJTExBIERFUMOTU0lUTwoKQ29tbyBlZGl0b3IgZGUgZXN0ZSDDrXRlbSwgdXN0ZWQgcHVlZGUgbW92ZXJsbyBhIHJldmlzacOzbiBzaW4gYW50ZXMgcmVzb2x2ZXIgbG9zIHByb2JsZW1hcyBpZGVudGlmaWNhZG9zLCBkZSBsbyBjb250cmFyaW8sIGhhZ2EgY2xpYyBlbiBHdWFyZGFyIHBhcmEgZ3VhcmRhciBlbCDDrXRlbSB5IHNvbHVjaW9uYXIgZXN0b3MgcHJvYmxlbWFzIG1hcyB0YXJkZS4KClBhcmEgdHJhYmFqb3MgZGVwb3NpdGFkb3MgcG9yIHN1IHByb3BpbyBhdXRvcjoKIApBbCBhdXRvYXJjaGl2YXIgZXN0ZSBncnVwbyBkZSBhcmNoaXZvcyBkaWdpdGFsZXMgeSBzdXMgbWV0YWRhdG9zLCB5byBnYXJhbnRpem8gYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBVbmFsIGVsIGRlcmVjaG8gYSBhbG1hY2VuYXJsb3MgeSBtYW50ZW5lcmxvcyBkaXNwb25pYmxlcyBlbiBsw61uZWEgZGUgbWFuZXJhIGdyYXR1aXRhLiBEZWNsYXJvIHF1ZSBsYSBvYnJhIGVzIGRlIG1pIHByb3BpZWRhZCBpbnRlbGVjdHVhbCB5IHF1ZSBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVuYWwgbm8gYXN1bWUgbmluZ3VuYSByZXNwb25zYWJpbGlkYWQgc2kgaGF5IGFsZ3VuYSB2aW9sYWNpw7NuIGEgbG9zIGRlcmVjaG9zIGRlIGF1dG9yIGFsIGRpc3RyaWJ1aXIgZXN0b3MgYXJjaGl2b3MgeSBtZXRhZGF0b3MuIChTZSByZWNvbWllbmRhIGEgdG9kb3MgbG9zIGF1dG9yZXMgYSBpbmRpY2FyIHN1cyBkZXJlY2hvcyBkZSBhdXRvciBlbiBsYSBww6FnaW5hIGRlIHTDrXR1bG8gZGUgc3UgZG9jdW1lbnRvLikgRGUgbGEgbWlzbWEgbWFuZXJhLCBhY2VwdG8gbG9zIHTDqXJtaW5vcyBkZSBsYSBzaWd1aWVudGUgbGljZW5jaWE6IExvcyBhdXRvcmVzIG8gdGl0dWxhcmVzIGRlbCBkZXJlY2hvIGRlIGF1dG9yIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8gY29uZmllcmVuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgdW5hIGxpY2VuY2lhIG5vIGV4Y2x1c2l2YSwgbGltaXRhZGEgeSBncmF0dWl0YSBzb2JyZSBsYSBvYnJhIHF1ZSBzZSBpbnRlZ3JhIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwsIHF1ZSBzZSBhanVzdGEgYSBsYXMgc2lndWllbnRlcyBjYXJhY3RlcsOtc3RpY2FzOiBhKSBFc3RhcsOhIHZpZ2VudGUgYSBwYXJ0aXIgZGUgbGEgZmVjaGEgZW4gcXVlIHNlIGluY2x1eWUgZW4gZWwgcmVwb3NpdG9yaW8sIHF1ZSBzZXLDoW4gcHJvcnJvZ2FibGVzIGluZGVmaW5pZGFtZW50ZSBwb3IgZWwgdGllbXBvIHF1ZSBkdXJlIGVsIGRlcmVjaG8gcGF0cmltb25pYWwgZGVsIGF1dG9yLiBFbCBhdXRvciBwb2Ryw6EgZGFyIHBvciB0ZXJtaW5hZGEgbGEgbGljZW5jaWEgc29saWNpdMOhbmRvbG8gYSBsYSBVbml2ZXJzaWRhZC4gYikgTG9zIGF1dG9yZXMgYXV0b3JpemFuIGEgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgcGFyYSBwdWJsaWNhciBsYSBvYnJhIGVuIGVsIGZvcm1hdG8gcXVlIGVsIHJlcG9zaXRvcmlvIGxvIHJlcXVpZXJhIChpbXByZXNvLCBkaWdpdGFsLCBlbGVjdHLDs25pY28gbyBjdWFscXVpZXIgb3RybyBjb25vY2lkbyBvIHBvciBjb25vY2VyKSB5IGNvbm9jZW4gcXVlIGRhZG8gcXVlIHNlIHB1YmxpY2EgZW4gSW50ZXJuZXQgcG9yIGVzdGUgaGVjaG8gY2lyY3VsYSBjb24gYWxjYW5jZSBtdW5kaWFsLiBjKSBMb3MgYXV0b3JlcyBhY2VwdGFuIHF1ZSBsYSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgYSB0w610dWxvIGdyYXR1aXRvLCBwb3IgbG8gdGFudG8sIHJlbnVuY2lhbiBhIHJlY2liaXIgZW1vbHVtZW50byBhbGd1bm8gcG9yIGxhIHB1YmxpY2FjacOzbiwgZGlzdHJpYnVjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSB5IGN1YWxxdWllciBvdHJvIHVzbyBxdWUgc2UgaGFnYSBlbiBsb3MgdMOpcm1pbm9zIGRlIGxhIHByZXNlbnRlIGxpY2VuY2lhIHkgZGUgbGEgbGljZW5jaWEgQ3JlYXRpdmUgQ29tbW9ucyBjb24gcXVlIHNlIHB1YmxpY2EuIGQpIExvcyBhdXRvcmVzIG1hbmlmaWVzdGFuIHF1ZSBzZSB0cmF0YSBkZSB1bmEgb2JyYSBvcmlnaW5hbCBzb2JyZSBsYSBxdWUgdGllbmVuIGxvcyBkZXJlY2hvcyBxdWUgYXV0b3JpemFuIHkgcXVlIHNvbiBlbGxvcyBxdWllbmVzIGFzdW1lbiB0b3RhbCByZXNwb25zYWJpbGlkYWQgcG9yIGVsIGNvbnRlbmlkbyBkZSBzdSBvYnJhIGFudGUgbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgeSBhbnRlIHRlcmNlcm9zLiBFbiB0b2RvIGNhc28gbGEgVW5pdmVyc2lkYWQgTmFjaW9uYWwgZGUgQ29sb21iaWEgc2UgY29tcHJvbWV0ZSBhIGluZGljYXIgc2llbXByZSBsYSBhdXRvcsOtYSBpbmNsdXllbmRvIGVsIG5vbWJyZSBkZWwgYXV0b3IgeSBsYSBmZWNoYSBkZSBwdWJsaWNhY2nDs24uIGUpIExvcyBhdXRvcmVzIGF1dG9yaXphbiBhIGxhIFVuaXZlcnNpZGFkIHBhcmEgaW5jbHVpciBsYSBvYnJhIGVuIGxvcyBhZ3JlZ2Fkb3JlcywgaW5kaWNlc3MgeSBidXNjYWRvcmVzIHF1ZSBzZSBlc3RpbWVuIG5lY2VzYXJpb3MgcGFyYSBwcm9tb3ZlciBzdSBkaWZ1c2nDs24uIGYpIExvcyBhdXRvcmVzIGFjZXB0YW4gcXVlIGxhIFVuaXZlcnNpZGFkIE5hY2lvbmFsIGRlIENvbG9tYmlhIHB1ZWRhIGNvbnZlcnRpciBlbCBkb2N1bWVudG8gYSBjdWFscXVpZXIgbWVkaW8gbyBmb3JtYXRvIHBhcmEgcHJvcMOzc2l0b3MgZGUgcHJlc2VydmFjacOzbiBkaWdpdGFsLiBTSSBFTCBET0NVTUVOVE8gU0UgQkFTQSBFTiBVTiBUUkFCQUpPIFFVRSBIQSBTSURPIFBBVFJPQ0lOQURPIE8gQVBPWUFETyBQT1IgVU5BIEFHRU5DSUEgTyBVTkEgT1JHQU5JWkFDScOTTiwgQ09OIEVYQ0VQQ0nDk04gREUgTEEgVU5JVkVSU0lEQUQgTkFDSU9OQUwgREUgQ09MT01CSUEsIExPUyBBVVRPUkVTIEdBUkFOVElaQU4gUVVFIFNFIEhBIENVTVBMSURPIENPTiBMT1MgREVSRUNIT1MgWSBPQkxJR0FDSU9ORVMgUkVRVUVSSURPUyBQT1IgRUwgUkVTUEVDVElWTyBDT05UUkFUTyBPIEFDVUVSRE8uIAoKUGFyYSB0cmFiYWpvcyBkZXBvc2l0YWRvcyBwb3Igb3RyYXMgcGVyc29uYXMgZGlzdGludGFzIGEgc3UgYXV0b3I6IAoKRGVjbGFybyBxdWUgZWwgZ3J1cG8gZGUgYXJjaGl2b3MgZGlnaXRhbGVzIHkgbWV0YWRhdG9zIGFzb2NpYWRvcyBxdWUgZXN0b3kgYXJjaGl2YW5kbyBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsIFVOKSBlcyBkZSBkb21pbmlvIHDDumJsaWNvLiBTaSBubyBmdWVzZSBlbCBjYXNvLCBhY2VwdG8gdG9kYSBsYSByZXNwb25zYWJpbGlkYWQgcG9yIGN1YWxxdWllciBpbmZyYWNjacOzbiBkZSBkZXJlY2hvcyBkZSBhdXRvciBxdWUgY29ubGxldmUgbGEgZGlzdHJpYnVjacOzbiBkZSBlc3RvcyBhcmNoaXZvcyB5IG1ldGFkYXRvcy4KTk9UQTogU0kgTEEgVEVTSVMgQSBQVUJMSUNBUiBBRFFVSVJJw5MgQ09NUFJPTUlTT1MgREUgQ09ORklERU5DSUFMSURBRCBFTiBFTCBERVNBUlJPTExPIE8gUEFSVEVTIERFTCBET0NVTUVOVE8uIFNJR0EgTEEgRElSRUNUUklaIERFIExBIFJFU09MVUNJw5NOIDAyMyBERSAyMDE1LCBQT1IgTEEgQ1VBTCBTRSBFU1RBQkxFQ0UgRUwgUFJPQ0VESU1JRU5UTyBQQVJBIExBIFBVQkxJQ0FDScOTTiBERSBURVNJUyBERSBNQUVTVFLDjUEgWSBET0NUT1JBRE8gREUgTE9TIEVTVFVESUFOVEVTIERFIExBIFVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIEVOIEVMIFJFUE9TSVRPUklPIElOU1RJVFVDSU9OQUwgVU4sIEVYUEVESURBIFBPUiBMQSBTRUNSRVRBUsONQSBHRU5FUkFMLiAqTEEgVEVTSVMgQSBQVUJMSUNBUiBERUJFIFNFUiBMQSBWRVJTScOTTiBGSU5BTCBBUFJPQkFEQS4gCgpBbCBoYWNlciBjbGljIGVuIGVsIHNpZ3VpZW50ZSBib3TDs24sIHVzdGVkIGluZGljYSBxdWUgZXN0w6EgZGUgYWN1ZXJkbyBjb24gZXN0b3MgdMOpcm1pbm9zLiBTaSB0aWVuZSBhbGd1bmEgZHVkYSBzb2JyZSBsYSBsaWNlbmNpYSwgcG9yIGZhdm9yLCBjb250YWN0ZSBjb24gZWwgYWRtaW5pc3RyYWRvciBkZWwgc2lzdGVtYS4KClVOSVZFUlNJREFEIE5BQ0lPTkFMIERFIENPTE9NQklBIC0gw5psdGltYSBtb2RpZmljYWNpw7NuIDE5LzEwLzIwMjEK |