Data driven initialization for machine learning classification models

El principal objetivo de este proyecto de grado es desarrollar una estrategia para la inicialización de los parámetros θ tanto para la regresión logística (clasificador lineal) como para la regresión multinomial, y las redes neuronales clásicas (fully connected feed-forward). Esta inicialización se...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2022
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/34737
Acceso en línea:
https://doi.org/10.48713/10336_34737
https://repository.urosario.edu.co/handle/10336/34737
Palabra clave:
Redes Neuronales
Regresión Logística
Gradiente Descendiente
Parámetros de un Modelo de Clasificación
Vectores Característicos
Distribución de las Clases
Matemáticas
Neural Networks
Logistic Regression
Characteristic Vectors
Classes Distributions
Classification Models Parameters
Rights
License
Atribución-NoComercial-CompartirIgual 2.5 Colombia
id EDOCUR2_86e71549e7f632ee29f73bcb698e6df7
oai_identifier_str oai:repository.urosario.edu.co:10336/34737
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
dc.title.es.fl_str_mv Data driven initialization for machine learning classification models
title Data driven initialization for machine learning classification models
spellingShingle Data driven initialization for machine learning classification models
Redes Neuronales
Regresión Logística
Gradiente Descendiente
Parámetros de un Modelo de Clasificación
Vectores Característicos
Distribución de las Clases
Matemáticas
Neural Networks
Logistic Regression
Characteristic Vectors
Classes Distributions
Classification Models Parameters
title_short Data driven initialization for machine learning classification models
title_full Data driven initialization for machine learning classification models
title_fullStr Data driven initialization for machine learning classification models
title_full_unstemmed Data driven initialization for machine learning classification models
title_sort Data driven initialization for machine learning classification models
dc.contributor.advisor.none.fl_str_mv Caicedo Dorado, Alexander
dc.subject.es.fl_str_mv Redes Neuronales
Regresión Logística
Gradiente Descendiente
Parámetros de un Modelo de Clasificación
Vectores Característicos
Distribución de las Clases
topic Redes Neuronales
Regresión Logística
Gradiente Descendiente
Parámetros de un Modelo de Clasificación
Vectores Característicos
Distribución de las Clases
Matemáticas
Neural Networks
Logistic Regression
Characteristic Vectors
Classes Distributions
Classification Models Parameters
dc.subject.ddc.es.fl_str_mv Matemáticas
dc.subject.keyword.es.fl_str_mv Neural Networks
Logistic Regression
Characteristic Vectors
Classes Distributions
Classification Models Parameters
description El principal objetivo de este proyecto de grado es desarrollar una estrategia para la inicialización de los parámetros θ tanto para la regresión logística (clasificador lineal) como para la regresión multinomial, y las redes neuronales clásicas (fully connected feed-forward). Esta inicialización se basó en las propiedades de la distribución estadística de los datos con los que se entrenan los modelos. Esto con el fin de inicializar el modelo en una región de la función de costo más adecuada y así, pueda llegar a una mejorar su tasa de convergencia, y producir mejores resultados en menores tiempos de entrenamiento. La tesis presenta una explicación intuitiva y matemática de los modelos de inicialización propuestos, y contrasta el desarrollo teórico con un benchmark donde se utilizaron diferentes datasets, incluyendo toy examples. Así mismo, también se presenta un análisis de estos resultados, se discuten las limitaciones de las propuestas y el trabajo futuro que se puede derivar a partir de este trabajo.
publishDate 2022
dc.date.accessioned.none.fl_str_mv 2022-08-22T19:50:57Z
dc.date.available.none.fl_str_mv 2022-08-22T19:50:57Z
dc.date.created.none.fl_str_mv 2022-05-08
dc.type.es.fl_str_mv bachelorThesis
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_7a1f
dc.type.document.es.fl_str_mv Análisis de caso
dc.type.spa.es.fl_str_mv Trabajo de grado
dc.identifier.doi.none.fl_str_mv https://doi.org/10.48713/10336_34737
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/34737
url https://doi.org/10.48713/10336_34737
https://repository.urosario.edu.co/handle/10336/34737
dc.language.iso.es.fl_str_mv eng
language eng
dc.rights.*.fl_str_mv Atribución-NoComercial-CompartirIgual 2.5 Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.acceso.es.fl_str_mv Abierto (Texto Completo)
dc.rights.uri.*.fl_str_mv http://creativecommons.org/licenses/by-nc-sa/2.5/co/
rights_invalid_str_mv Atribución-NoComercial-CompartirIgual 2.5 Colombia
Abierto (Texto Completo)
http://creativecommons.org/licenses/by-nc-sa/2.5/co/
http://purl.org/coar/access_right/c_abf2
dc.format.extent.es.fl_str_mv 94 pp
dc.format.mimetype.es.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad del Rosario
dc.publisher.department.none.fl_str_mv Escuela de Ingeniería, Ciencia y Tecnología
dc.publisher.program.none.fl_str_mv Programa de Matemáticas Aplicadas y Ciencias de la Computación - MACC
publisher.none.fl_str_mv Universidad del Rosario
institution Universidad del Rosario
dc.source.bibliographicCitation.es.fl_str_mv F. Luis and G. Moncayo, No Title, Third, T. Dietterich, C. Bishop, D. Hecker- man, M. Jordan, and M. Kearns, Eds., ISBN: 9788490225370
R. S. M. Carbonell J.G. and T. M. Mitchell, “Machine Learning: A Historical and Methodological Analysis,” AI Mag., vol. 4, no. 3, 1983.
R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification,” International Journal of Ad- vanced Research in Artificial Intelligence, vol. 2, no. 2, 2013, ISSN: 21654050. DOI: 10.14569/ijarai.2013.020206.
L. M. Castro Heredia, Y. Carvajal Escobar, and Á. J. Ávila Díaz, “Análisis Clúster como Técnica de Análisis Exploratorio de Registros Múltiples en Datos Meteorológicos,” Ingeniería de Recursos Naturales y del Ambiente, vol. enero-dici, no. 11, pp. 11–20, 2012.
H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A. Moustafa, “The application of unsupervised clustering methods to Alzheimer’s disease,” Frontiers in Computational Neuroscience, vol. 13, no. May, pp. 1–9, 2019, ISSN: 16625188. DOI: 10.3389/fncom.2019.00031.
D. N. Wagner, “Economic patterns in a world with artificial intelligence,” Evo- lutionary and Institutional Economics Review, vol. 17, no. 1, pp. 111–131, 2020, ISSN: 1349-4961. DOI: 10.1007/s40844-019-00157-x. [Online]. Available: https://doi.org/10.1007/s40844-019-00157-x.
Z. Wang, Q. Wang, and D. W. Wang, “Bayesian network based business in- formation retrieval model,” Knowledge and Information Systems, vol. 20, no. 1, pp. 63–79, 2009, ISSN: 02193116. DOI: 10.1007/s10115-008-0151-5.
C. Lemaréchal, “Cauchy and the Gradient Method,” Documenta Mathematica, vol. ISMP, pp. 251–254, 2012. [Online]. Available: https://www.math.uni- bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf.
S. Ruder, “An overview of gradient descent optimization algorithms,” pp. 1– 14, 2016. arXiv: 1609.04747. [Online]. Available: http://arxiv.org/abs/ 1609.04747.
A. Lydia and S. Francis, “Adagrad - an optimizer for stochastic gradient de- scent,” vol. Volume 6, pp. 566–568, May 2019.
H. Shaziya, “A study of the optimization algorithms in deep learning,” Mar. 2020. DOI: 10.1109/ICISC44355.2019.9036442.
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014. DOI: 10.48550/ARXIV.1412.6980. [Online]. Available: https://arxiv.org/abs/ 1412.6980.
Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship Rotated Bounding Box Space for Ship Extraction from High-Resolution Optical Satellite Images with Com- plex Backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 8, pp. 1074–1078, 2016, ISSN: 1545598X. DOI: 10.1109/LGRS.2016.2565705.
S. K. Kumar, “On weight initialization in deep neural networks,” pp. 1–9, 2017. arXiv: 1704.08863. [Online]. Available: http://arxiv.org/abs/1704.08863.
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1026–1034, 2015, ISSN: 15505499. DOI: 10.1109/ICCV.2015.123. arXiv: 1502.01852.
H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in Neural Information Processing Systems, vol. 2018-December, no. NeurIPS 2018, pp. 6389–6399, 2018, ISSN: 10495258. arXiv: 1712.09913.
E. Hurtado Dianderas, “Modelo De Regresión Logística,” Gestión en el Tercer Milenio, vol. 10, no. 20, pp. 25–27, 2007, ISSN: 1560-9081. DOI: 10.15381/gtm. v10i20.9059.
V. F. Dr. Vladimir, No Title No Title No Title, 69. 1967, vol. 1, pp. 5–24, ISBN: 9781107057135.
A. Field, “Logistic regression Logistic regression Logistic regression,” Discov- ering Statistics Using SPSS, pp. 731–735, 2012.
L. Camarero Rioja, A. Almazán Llorente, and B. Mañas Ramírez, “Regresión Logística : Fundamentos y aplicación a la investigación sociológica,” Análi- sis Multivariante, p. 61, 2011. [Online]. Available: https://www2.uned.es/ socioestadistica/Multivariante/Odd_Ratio_LogitV2.pdf.
G. Zurek, M. Blach, P. Giedziun, J. Czakon, L. Fulawka, and L. Halo, “Brain tumor classification using logistic regression and linear support vector classi- fier,” no. September 2016, pp. 1–4, 2015.
Ö. Çokluk, “Logistic regression: Concept and application,” Kuram ve Uygula- mada Egitim Bilimleri, vol. 10, no. 3, pp. 1397–1407, 2010, ISSN: 13030485.
M. A. Aljarrah, F. Famoye, and C. Lee, “Generalized logistic distribution and its regression model,” Journal of Statistical Distributions and Applications, vol. 7, no. 1, 2020, ISSN: 21955832. DOI: 10.1186/s40488-020-00107-8.
M. K. Cain, Z. Zhang, and K. H. Yuan, “Univariate and multivariate skew- ness and kurtosis for measuring nonnormality: Prevalence, influence and es- timation,” Behavior Research Methods, vol. 49, no. 5, pp. 1716–1735, 2017, ISSN: 15543528. DOI: 10.3758/s13428-016-0814-1.
T. et. all. Hastie, “Springer Series in Statistics The Elements of Statistical Learning,” The Mathematical Intelligencer, vol. 27, no. 2, pp. 83–85, 2009, ISSN: 03436993. [Online]. Available: http : / / www . springerlink . com / index / D7X7KX6772HQ2135.pdf.
P. Xu, F. Davoine, T. Denoeux, et al., “Evidential multinomial logistic regres- sion for multiclass classifier calibration To cite this version : HAL Id : hal- 01271569 Evidential multinomial logistic regression for multiclass classifier calibration,” Universite de technologie de Compie‘gne, 2016.
R. Rifkin, “In Defense of One-Vs-All Classification In Defense of One-Vs-All Classification,” Journal of Machine Learning Research 5 (2004) 101-141, no. June 2014, 2004.
N. S. Themudo, “ Emanuela Bozzini and Bernard Enjolras (Eds.), Governing Ambiguities: New Forms of Local Governance and Civil Society,” Journal of Comparative Policy Analysis: Research and Practice, vol. 15, no. 5, pp. 476–477, 2013, ISSN: 1387-6988. DOI: 10.1080/13876988.2013.846961.
N. Psy and C. D. Analysis, “Total Categories of the Outcome, Indexed By the Subscript,” Spring, pp. 1–5, 2021.
T. M. Mitchell and T. M. Mitchell, “The Need for Biases in Learning Generalizations by Computer Science Department Rutgers University New Brunswick , NJ 08904 published May 1980 as Rutgers CS tech report CBM-TR- 117 The Need for Biases in Learning Generalizations,” no. May, 1980.
P. T. Pregnancy, T. I. Fertility, and F. Growth, “And Development And Devel- opment,” Learning, vol. 50, no. 2011, pp. 681–730, 2005. DOI: 10.1016/j.cell. 2017.06.036.Evolution. [Online]. Available: http://tailieudientu.lrc. tnu.edu.vn/Upload/Collection/brief/brief_49491_54583_TN201500606. pdf.
F Amenta, D Zaccheo, and W. L. Collier, “Neurotransmitters, neuroreceptors and aging,” Mechanisms of Ageing and Development, vol. 61, no. 3, pp. 249–273, 1991, ISSN: 0047-6374. DOI: https://doi.org/10.1016/0047-6374(91)90059- 9. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/0047637491900599.
C. Mavridis and J. Baras, “Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach,” 2021. arXiv: 2112.02256. [Online]. Available: http://arxiv.org/abs/2112.02256.
R. Matsumura, K. Harada, Y. Domae, and W. Wan, “Learning based industrial bin-picking trained with approximate physics simulator,” Advances in Intelli- gent Systems and Computing, vol. 867, pp. 786–798, 2019, ISSN: 21945357. DOI: 10.1007/978-3-030-01370-7_61. arXiv: 1805.08936.
T. Zamora, Zumbado, “McCulloch-Pitts Artificial Neuron and Rosenblatt ’ s Perceptron,” pp. 16–29,
S. Haykin, “Rosenblatt ’ s Perceptron,” Neural Networks and Learning Machines, no. 1943, pp. 47–67, 2009.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
B. Cheng, R. Xiao, Y. Guo, Y. Hu, J. Wang, and L. Zhang, “Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition,” 2018. arXiv: 1809.06131. [Online]. Available: http: //arxiv.org/abs/1809.06131.
P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, “Data-dependent ini- tializations of convolutional neural networks,” 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016. arXiv: 1511.06856.
J. W. Grzymala-Busse, Z. S. Hippe, and T. Mroczek, “Reduced data sets and entropy-based discretization,” Entropy, vol. 21, no. 11, pp. 1–11, 2019, ISSN: 10994300. DOI: 10.3390/e21111051.
S. Parashar, N. Fateh, V. Pediroda, and C. Poloni, “Self organizing maps (SOM) for design selection in multi-objective optimization using modeFRONTIER,” SAE Technical Papers, no. August 2016, 2008, ISSN: 26883627. DOI: 10.4271/ 2008-01-0874.
D. Reddy and P. K. Jana, “A new clustering algorithm based on Voronoi dia- gram,” International Journal of Data Mining, Modelling and Management, vol. 6, no. 1, pp. 49–64, 2014, ISSN: 17591171. DOI: 10.1504/IJDMMM.2014.059977.
T. H. Sardar and Z. Ansari, “Partition based clustering of large datasets us- ing MapReduce framework: An analysis of recent themes and directions,” Fu- ture Computing and Informatics Journal, vol. 3, no. 2, pp. 247–261, 2018, ISSN: 23147288. DOI: 10.1016/j.fcij.2018.06.002. [Online]. Available: https: //doi.org/10.1016/j.fcij.2018.06.002.
O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A Survey on Locality Sensitive Hashing Algorithms and their Applications,” ACM Comput- ing Surveys, no. April, pp. 0–23, 2021. arXiv: 2102.08942. [Online]. Available: http://arxiv.org/abs/2102.08942.
C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020. DOI: 10.1038/ s41586-020-2649-2. [Online]. Available: https://doi.org/10.1038/s41586- 020-2649-2.
J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007. DOI: 10.1109/MCSE.2007.55.
T. pandas development team, Pandas-dev/pandas: Pandas, version latest, Feb. 2020. DOI: 10.5281/zenodo.3509134. [Online]. Available: https://doi.org/ 10.5281/zenodo.3509134.
M. L. Waskom, “Seaborn: Statistical data visualization,” Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021. DOI: 10.21105/joss.03021. [Online]. Available: https://doi.org/10.21105/joss.03021.
F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available: http://archive.ics.uci.edu/ml.
B.-H. A. Guyon Isabelle Gunn R. Steve and G. Dror, Result analysis of the nips 2003 feature selection challenge, 2004.
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
bitstream.url.fl_str_mv https://repository.urosario.edu.co/bitstreams/bb7de9d7-1410-49f2-a27f-70aaf4824dbe/download
https://repository.urosario.edu.co/bitstreams/4f706125-970c-46d4-87b9-ea87ebb07323/download
https://repository.urosario.edu.co/bitstreams/cb40d82e-e483-4687-a7e9-b622a72ec256/download
https://repository.urosario.edu.co/bitstreams/269546cc-bb7b-42dc-8204-5d36b373d49a/download
https://repository.urosario.edu.co/bitstreams/897dce26-7c20-4d3a-99f5-4d1683c993f5/download
bitstream.checksum.fl_str_mv be0a64d258b047553d207e045419bf77
fab9d9ed61d64f6ac005dee3306ae77e
1487462a1490a8fc01f5999ce7b3b9cc
e5f324a23906908f71945b4c96f7145c
40271e365d6e7b173b9fa52c570b4d19
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1814167511922376704
spelling Caicedo Dorado, Alexander14139512600López Jaimes, David SantiagoProfesional en Matemáticas Aplicadas y Ciencias de la ComputaciónPregradoFull timef49dd13f-f2f4-42c4-9256-425d7a3a06ec6002022-08-22T19:50:57Z2022-08-22T19:50:57Z2022-05-08El principal objetivo de este proyecto de grado es desarrollar una estrategia para la inicialización de los parámetros θ tanto para la regresión logística (clasificador lineal) como para la regresión multinomial, y las redes neuronales clásicas (fully connected feed-forward). Esta inicialización se basó en las propiedades de la distribución estadística de los datos con los que se entrenan los modelos. Esto con el fin de inicializar el modelo en una región de la función de costo más adecuada y así, pueda llegar a una mejorar su tasa de convergencia, y producir mejores resultados en menores tiempos de entrenamiento. La tesis presenta una explicación intuitiva y matemática de los modelos de inicialización propuestos, y contrasta el desarrollo teórico con un benchmark donde se utilizaron diferentes datasets, incluyendo toy examples. Así mismo, también se presenta un análisis de estos resultados, se discuten las limitaciones de las propuestas y el trabajo futuro que se puede derivar a partir de este trabajo.Thanks to the great advance technology has had, the increase in computer resources and the strong impact that the era of "big data" has had on society, artificial intel- ligence has become a highly studied and used area. Machine learning is a branch of artificial intelligence which main objective is to build models that are capable of learning from a set of data, without the need to be explicitly programmed. These models use tools from different branches of mathematics, such as statistics and lin- ear algebra, to identify patterns and relationships between a set of data. Regarding machine learning, it allows us to generate models that are capable of classifying a set of data based on its intrinsic characteristics and its relationship with an objective variable. These models are widely used in real-life problems, such as classifying a bank transaction as malicious or normal, determining with a certain probability whether a tumor is malignant or benign, estimating a person’s credit risk, among others. Most of these classification models learn through the use of gradient descent or its variations. This is an iterative algorithm which allows finding the parameters θ of the model that minimize a cost function and allow an adequate classification. These parameters are initialized randomly. However, there are several limitations when training these models. The real data is in considerably large dimensions, and it is difficult to know the shape of the cost surface that is generated with it. This causes the models to require a lot of care, and a large amount of time and computational resources for their training. On the other hand, due that in most cases the cost func- tion is not convex as it normally happens in neural networks, it is possible that when initializing the weights randomly, the algorithm stalls because it was initialized in a flat region of the cost function, or that it initializes in a very rough region and does not converge to an appropriate minimum. This is way the present study aims to propose an initialization strategy for classi- fication problems that initialize the models in an appropriate region of the cost func- tion in order to improve its convergence rate and produce better results in faster training times. We aim to propose a new deterministic initialization strategy for the logistic regression (linear classifier), the multinomial logistic regression and the classical neural networks (fully connected feed-forward) for classification problems. We proposed an initialization strategy based on the properties of the statistical distribution of the data on which the models are trained. For the logistic regression and the multinomial logistic regression we propose to initialize the models with a characteristic vector of the data distribution of each class, such as its mean or me- dian. In the fully connected feed-forward neural networks we propose to use pro- totype data of each one of the classes. These prototype data are not the most repre- sentative data of the entire class distribution, but in this case, they are data that map and linearize the separation boundary with the other classes. A benchmark for the initialization proposal was made using various real datasets for classification tasks from the UCI and Kaggle repositories. We also tested the proposed initializations with different toy examples. In the logistic regression, we compared the behavior of the model using ran- dom initialization and using the proposed initialization. For fully connected feed- forward neural networks, we compared the behavior of the neural networks using the proposed initialization and the state of the art initializations for these models,Xavier’s and He’s initialization. In both cases, we were able to successfully initial- ize the models reducing it’s required training time and making the learning algo- rithm start in a better region of the cost function. In this way, we proposed new initialization strategies for the multinomial logistic regression and the neural network models for classification problems. The logistic regression initialization is based on statistical estimators of the data distribution and distance metrics, particularly the mean and the euclidean distance between different scalar products. The neural networks strategy is based on the decision boundary linearization using prototype data of each one of the classes. We have seen that our approach works very well for all the tested datasets, considerably reducing the computational resources required for the training of these models and increasing their performance.94 ppapplication/pdfhttps://doi.org/10.48713/10336_34737https://repository.urosario.edu.co/handle/10336/34737engUniversidad del RosarioEscuela de Ingeniería, Ciencia y TecnologíaPrograma de Matemáticas Aplicadas y Ciencias de la Computación - MACCAtribución-NoComercial-CompartirIgual 2.5 ColombiaAbierto (Texto Completo)http://creativecommons.org/licenses/by-nc-sa/2.5/co/http://purl.org/coar/access_right/c_abf2F. Luis and G. Moncayo, No Title, Third, T. Dietterich, C. Bishop, D. Hecker- man, M. Jordan, and M. Kearns, Eds., ISBN: 9788490225370R. S. M. Carbonell J.G. and T. M. Mitchell, “Machine Learning: A Historical and Methodological Analysis,” AI Mag., vol. 4, no. 3, 1983.R. Sathya and A. Abraham, “Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification,” International Journal of Ad- vanced Research in Artificial Intelligence, vol. 2, no. 2, 2013, ISSN: 21654050. DOI: 10.14569/ijarai.2013.020206.L. M. Castro Heredia, Y. Carvajal Escobar, and Á. J. Ávila Díaz, “Análisis Clúster como Técnica de Análisis Exploratorio de Registros Múltiples en Datos Meteorológicos,” Ingeniería de Recursos Naturales y del Ambiente, vol. enero-dici, no. 11, pp. 11–20, 2012.H. Alashwal, M. El Halaby, J. J. Crouse, A. Abdalla, and A. A. Moustafa, “The application of unsupervised clustering methods to Alzheimer’s disease,” Frontiers in Computational Neuroscience, vol. 13, no. May, pp. 1–9, 2019, ISSN: 16625188. DOI: 10.3389/fncom.2019.00031.D. N. Wagner, “Economic patterns in a world with artificial intelligence,” Evo- lutionary and Institutional Economics Review, vol. 17, no. 1, pp. 111–131, 2020, ISSN: 1349-4961. DOI: 10.1007/s40844-019-00157-x. [Online]. Available: https://doi.org/10.1007/s40844-019-00157-x.Z. Wang, Q. Wang, and D. W. Wang, “Bayesian network based business in- formation retrieval model,” Knowledge and Information Systems, vol. 20, no. 1, pp. 63–79, 2009, ISSN: 02193116. DOI: 10.1007/s10115-008-0151-5.C. Lemaréchal, “Cauchy and the Gradient Method,” Documenta Mathematica, vol. ISMP, pp. 251–254, 2012. [Online]. Available: https://www.math.uni- bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf.S. Ruder, “An overview of gradient descent optimization algorithms,” pp. 1– 14, 2016. arXiv: 1609.04747. [Online]. Available: http://arxiv.org/abs/ 1609.04747.A. Lydia and S. Francis, “Adagrad - an optimizer for stochastic gradient de- scent,” vol. Volume 6, pp. 566–568, May 2019.H. Shaziya, “A study of the optimization algorithms in deep learning,” Mar. 2020. DOI: 10.1109/ICISC44355.2019.9036442.D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014. DOI: 10.48550/ARXIV.1412.6980. [Online]. Available: https://arxiv.org/abs/ 1412.6980.Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship Rotated Bounding Box Space for Ship Extraction from High-Resolution Optical Satellite Images with Com- plex Backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 8, pp. 1074–1078, 2016, ISSN: 1545598X. DOI: 10.1109/LGRS.2016.2565705.S. K. Kumar, “On weight initialization in deep neural networks,” pp. 1–9, 2017. arXiv: 1704.08863. [Online]. Available: http://arxiv.org/abs/1704.08863.K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1026–1034, 2015, ISSN: 15505499. DOI: 10.1109/ICCV.2015.123. arXiv: 1502.01852.H. Li, Z. Xu, G. Taylor, C. Studer, and T. Goldstein, “Visualizing the loss landscape of neural nets,” Advances in Neural Information Processing Systems, vol. 2018-December, no. NeurIPS 2018, pp. 6389–6399, 2018, ISSN: 10495258. arXiv: 1712.09913.E. Hurtado Dianderas, “Modelo De Regresión Logística,” Gestión en el Tercer Milenio, vol. 10, no. 20, pp. 25–27, 2007, ISSN: 1560-9081. DOI: 10.15381/gtm. v10i20.9059.V. F. Dr. Vladimir, No Title No Title No Title, 69. 1967, vol. 1, pp. 5–24, ISBN: 9781107057135.A. Field, “Logistic regression Logistic regression Logistic regression,” Discov- ering Statistics Using SPSS, pp. 731–735, 2012.L. Camarero Rioja, A. Almazán Llorente, and B. Mañas Ramírez, “Regresión Logística : Fundamentos y aplicación a la investigación sociológica,” Análi- sis Multivariante, p. 61, 2011. [Online]. Available: https://www2.uned.es/ socioestadistica/Multivariante/Odd_Ratio_LogitV2.pdf.G. Zurek, M. Blach, P. Giedziun, J. Czakon, L. Fulawka, and L. Halo, “Brain tumor classification using logistic regression and linear support vector classi- fier,” no. September 2016, pp. 1–4, 2015.Ö. Çokluk, “Logistic regression: Concept and application,” Kuram ve Uygula- mada Egitim Bilimleri, vol. 10, no. 3, pp. 1397–1407, 2010, ISSN: 13030485.M. A. Aljarrah, F. Famoye, and C. Lee, “Generalized logistic distribution and its regression model,” Journal of Statistical Distributions and Applications, vol. 7, no. 1, 2020, ISSN: 21955832. DOI: 10.1186/s40488-020-00107-8.M. K. Cain, Z. Zhang, and K. H. Yuan, “Univariate and multivariate skew- ness and kurtosis for measuring nonnormality: Prevalence, influence and es- timation,” Behavior Research Methods, vol. 49, no. 5, pp. 1716–1735, 2017, ISSN: 15543528. DOI: 10.3758/s13428-016-0814-1.T. et. all. Hastie, “Springer Series in Statistics The Elements of Statistical Learning,” The Mathematical Intelligencer, vol. 27, no. 2, pp. 83–85, 2009, ISSN: 03436993. [Online]. Available: http : / / www . springerlink . com / index / D7X7KX6772HQ2135.pdf.P. Xu, F. Davoine, T. Denoeux, et al., “Evidential multinomial logistic regres- sion for multiclass classifier calibration To cite this version : HAL Id : hal- 01271569 Evidential multinomial logistic regression for multiclass classifier calibration,” Universite de technologie de Compie‘gne, 2016.R. Rifkin, “In Defense of One-Vs-All Classification In Defense of One-Vs-All Classification,” Journal of Machine Learning Research 5 (2004) 101-141, no. June 2014, 2004.N. S. Themudo, “ Emanuela Bozzini and Bernard Enjolras (Eds.), Governing Ambiguities: New Forms of Local Governance and Civil Society,” Journal of Comparative Policy Analysis: Research and Practice, vol. 15, no. 5, pp. 476–477, 2013, ISSN: 1387-6988. DOI: 10.1080/13876988.2013.846961.N. Psy and C. D. Analysis, “Total Categories of the Outcome, Indexed By the Subscript,” Spring, pp. 1–5, 2021.T. M. Mitchell and T. M. Mitchell, “The Need for Biases in Learning Generalizations by Computer Science Department Rutgers University New Brunswick , NJ 08904 published May 1980 as Rutgers CS tech report CBM-TR- 117 The Need for Biases in Learning Generalizations,” no. May, 1980.P. T. Pregnancy, T. I. Fertility, and F. Growth, “And Development And Devel- opment,” Learning, vol. 50, no. 2011, pp. 681–730, 2005. DOI: 10.1016/j.cell. 2017.06.036.Evolution. [Online]. Available: http://tailieudientu.lrc. tnu.edu.vn/Upload/Collection/brief/brief_49491_54583_TN201500606. pdf.F Amenta, D Zaccheo, and W. L. Collier, “Neurotransmitters, neuroreceptors and aging,” Mechanisms of Ageing and Development, vol. 61, no. 3, pp. 249–273, 1991, ISSN: 0047-6374. DOI: https://doi.org/10.1016/0047-6374(91)90059- 9. [Online]. Available: https://www.sciencedirect.com/science/article/ pii/0047637491900599.C. Mavridis and J. Baras, “Towards the One Learning Algorithm Hypothesis: A System-theoretic Approach,” 2021. arXiv: 2112.02256. [Online]. Available: http://arxiv.org/abs/2112.02256.R. Matsumura, K. Harada, Y. Domae, and W. Wan, “Learning based industrial bin-picking trained with approximate physics simulator,” Advances in Intelli- gent Systems and Computing, vol. 867, pp. 786–798, 2019, ISSN: 21945357. DOI: 10.1007/978-3-030-01370-7_61. arXiv: 1805.08936.T. Zamora, Zumbado, “McCulloch-Pitts Artificial Neuron and Rosenblatt ’ s Perceptron,” pp. 16–29,S. Haykin, “Rosenblatt ’ s Perceptron,” Neural Networks and Learning Machines, no. 1943, pp. 47–67, 2009.D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.B. Cheng, R. Xiao, Y. Guo, Y. Hu, J. Wang, and L. Zhang, “Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition,” 2018. arXiv: 1809.06131. [Online]. Available: http: //arxiv.org/abs/1809.06131.P. Krähenbühl, C. Doersch, J. Donahue, and T. Darrell, “Data-dependent ini- tializations of convolutional neural networks,” 4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings, 2016. arXiv: 1511.06856.J. W. Grzymala-Busse, Z. S. Hippe, and T. Mroczek, “Reduced data sets and entropy-based discretization,” Entropy, vol. 21, no. 11, pp. 1–11, 2019, ISSN: 10994300. DOI: 10.3390/e21111051.S. Parashar, N. Fateh, V. Pediroda, and C. Poloni, “Self organizing maps (SOM) for design selection in multi-objective optimization using modeFRONTIER,” SAE Technical Papers, no. August 2016, 2008, ISSN: 26883627. DOI: 10.4271/ 2008-01-0874.D. Reddy and P. K. Jana, “A new clustering algorithm based on Voronoi dia- gram,” International Journal of Data Mining, Modelling and Management, vol. 6, no. 1, pp. 49–64, 2014, ISSN: 17591171. DOI: 10.1504/IJDMMM.2014.059977.T. H. Sardar and Z. Ansari, “Partition based clustering of large datasets us- ing MapReduce framework: An analysis of recent themes and directions,” Fu- ture Computing and Informatics Journal, vol. 3, no. 2, pp. 247–261, 2018, ISSN: 23147288. DOI: 10.1016/j.fcij.2018.06.002. [Online]. Available: https: //doi.org/10.1016/j.fcij.2018.06.002.O. Jafari, P. Maurya, P. Nagarkar, K. M. Islam, and C. Crushev, “A Survey on Locality Sensitive Hashing Algorithms and their Applications,” ACM Comput- ing Surveys, no. April, pp. 0–23, 2021. arXiv: 2102.08942. [Online]. Available: http://arxiv.org/abs/2102.08942.C. R. Harris, K. J. Millman, S. J. van der Walt, et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020. DOI: 10.1038/ s41586-020-2649-2. [Online]. Available: https://doi.org/10.1038/s41586- 020-2649-2.J. D. Hunter, “Matplotlib: A 2d graphics environment,” Computing in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007. DOI: 10.1109/MCSE.2007.55.T. pandas development team, Pandas-dev/pandas: Pandas, version latest, Feb. 2020. DOI: 10.5281/zenodo.3509134. [Online]. Available: https://doi.org/ 10.5281/zenodo.3509134.M. L. Waskom, “Seaborn: Statistical data visualization,” Journal of Open Source Software, vol. 6, no. 60, p. 3021, 2021. DOI: 10.21105/joss.03021. [Online]. Available: https://doi.org/10.21105/joss.03021.F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.D. Dua and C. Graff, UCI machine learning repository, 2017. [Online]. Available: http://archive.ics.uci.edu/ml.B.-H. A. Guyon Isabelle Gunn R. Steve and G. Dror, Result analysis of the nips 2003 feature selection challenge, 2004.instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURRedes NeuronalesRegresión LogísticaGradiente DescendienteParámetros de un Modelo de ClasificaciónVectores CaracterísticosDistribución de las ClasesMatemáticas510600Neural NetworksLogistic RegressionCharacteristic VectorsClasses DistributionsClassification Models ParametersData driven initialization for machine learning classification modelsbachelorThesisAnálisis de casoTrabajo de gradohttp://purl.org/coar/resource_type/c_7a1fORIGINALLopezJaimes-DavidSantiago-2022.pdfLopezJaimes-DavidSantiago-2022.pdfTesisapplication/pdf9582264https://repository.urosario.edu.co/bitstreams/bb7de9d7-1410-49f2-a27f-70aaf4824dbe/downloadbe0a64d258b047553d207e045419bf77MD51LICENSElicense.txtlicense.txttext/plain1475https://repository.urosario.edu.co/bitstreams/4f706125-970c-46d4-87b9-ea87ebb07323/downloadfab9d9ed61d64f6ac005dee3306ae77eMD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-81037https://repository.urosario.edu.co/bitstreams/cb40d82e-e483-4687-a7e9-b622a72ec256/download1487462a1490a8fc01f5999ce7b3b9ccMD53TEXTLopezJaimes-DavidSantiago-2022.pdf.txtLopezJaimes-DavidSantiago-2022.pdf.txtExtracted texttext/plain170078https://repository.urosario.edu.co/bitstreams/269546cc-bb7b-42dc-8204-5d36b373d49a/downloade5f324a23906908f71945b4c96f7145cMD54THUMBNAILLopezJaimes-DavidSantiago-2022.pdf.jpgLopezJaimes-DavidSantiago-2022.pdf.jpgGenerated Thumbnailimage/jpeg2757https://repository.urosario.edu.co/bitstreams/897dce26-7c20-4d3a-99f5-4d1683c993f5/download40271e365d6e7b173b9fa52c570b4d19MD5510336/34737oai:repository.urosario.edu.co:10336/347372022-08-31 07:45:18.389http://creativecommons.org/licenses/by-nc-sa/2.5/co/Atribución-NoComercial-CompartirIgual 2.5 Colombiahttps://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.coRUwoTE9TKSBBVVRPUihFUyksIG1hbmlmaWVzdGEobWFuaWZlc3RhbW9zKSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8gbGEgb2JyYSBlcyBkZSBleGNsdXNpdmEgYXV0b3LDrWEgeSB0aWVuZSBsYSB0aXR1bGFyaWRhZCBzb2JyZSBsYSBtaXNtYS4gCgpQQVJHUkFGTzogRW4gY2FzbyBkZSBwcmVzZW50YXJzZSBjdWFscXVpZXIgcmVjbGFtYWNpw7NuIG8gYWNjacOzbiBwb3IgcGFydGUgZGUgdW4gdGVyY2VybyBlbiBjdWFudG8gYSBsb3MgZGVyZWNob3MgZGUgYXV0b3Igc29icmUgbGEgb2JyYSBlbiBjdWVzdGnDs24sIEVMIEFVVE9SLCBhc3VtaXLDoSB0b2RhIGxhIHJlc3BvbnNhYmlsaWRhZCwgeSBzYWxkcsOhIGVuIGRlZmVuc2EgZGUgbG9zIGRlcmVjaG9zIGFxdcOtIGF1dG9yaXphZG9zOyBwYXJhIHRvZG9zIGxvcyBlZmVjdG9zIGxhIHVuaXZlcnNpZGFkIGFjdMO6YSBjb21vIHVuIHRlcmNlcm8gZGUgYnVlbmEgZmUuIAoKRUwgQVVUT1IsIGF1dG9yaXphIGEgTEEgVU5JVkVSU0lEQUQgREVMIFJPU0FSSU8sICBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBhbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCAgdXRpbGljZSB5IHVzZSBsYSBvYnJhIG9iamV0byBkZSBsYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuLgoKLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KClBPTElUSUNBIERFIFRSQVRBTUlFTlRPIERFIERBVE9TIFBFUlNPTkFMRVMuIERlY2xhcm8gcXVlIGF1dG9yaXpvIHByZXZpYSB5IGRlIGZvcm1hIGluZm9ybWFkYSBlbCB0cmF0YW1pZW50byBkZSBtaXMgZGF0b3MgcGVyc29uYWxlcyBwb3IgcGFydGUgZGUgTEEgVU5JVkVSU0lEQUQgREVMIFJPU0FSSU8gIHBhcmEgZmluZXMgYWNhZMOpbWljb3MgeSBlbiBhcGxpY2FjacOzbiBkZSBjb252ZW5pb3MgY29uIHRlcmNlcm9zIG8gc2VydmljaW9zIGNvbmV4b3MgY29uIGFjdGl2aWRhZGVzIHByb3BpYXMgZGUgbGEgYWNhZGVtaWEsIGNvbiBlc3RyaWN0byBjdW1wbGltaWVudG8gZGUgbG9zIHByaW5jaXBpb3MgZGUgbGV5LiBQYXJhIGVsIGNvcnJlY3RvIGVqZXJjaWNpbyBkZSBtaSBkZXJlY2hvIGRlIGhhYmVhcyBkYXRhICBjdWVudG8gY29uIGxhIGN1ZW50YSBkZSBjb3JyZW8gaGFiZWFzZGF0YUB1cm9zYXJpby5lZHUuY28sIGRvbmRlIHByZXZpYSBpZGVudGlmaWNhY2nDs24gIHBvZHLDqSBzb2xpY2l0YXIgbGEgY29uc3VsdGEsIGNvcnJlY2Npw7NuIHkgc3VwcmVzacOzbiBkZSBtaXMgZGF0b3MuCgo=