Improvements for determining the number of clusters in k-means for innovation databases in SMEs
The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this stu...
- Autores:
-
amelec, viloria
Pineda Lezama, Omar Bonerge
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2019
- Institución:
- Corporación Universidad de la Costa
- Repositorio:
- REDICUC - Repositorio CUC
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.cuc.edu.co:11323/4834
- Acceso en línea:
- https://hdl.handle.net/11323/4834
https://repositorio.cuc.edu.co/
- Palabra clave:
- k-means
automatic clustering
differential evolution
k activation threshold
U-Control Chart
SMEs
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
id |
RCUC2_3c764af7458a676038b6edbd740cf337 |
---|---|
oai_identifier_str |
oai:repositorio.cuc.edu.co:11323/4834 |
network_acronym_str |
RCUC2 |
network_name_str |
REDICUC - Repositorio CUC |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
title |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
spellingShingle |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs k-means automatic clustering differential evolution k activation threshold U-Control Chart SMEs |
title_short |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
title_full |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
title_fullStr |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
title_full_unstemmed |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
title_sort |
Improvements for determining the number of clusters in k-means for innovation databases in SMEs |
dc.creator.fl_str_mv |
amelec, viloria Pineda Lezama, Omar Bonerge |
dc.contributor.author.spa.fl_str_mv |
amelec, viloria Pineda Lezama, Omar Bonerge |
dc.subject.spa.fl_str_mv |
k-means automatic clustering differential evolution k activation threshold U-Control Chart SMEs |
topic |
k-means automatic clustering differential evolution k activation threshold U-Control Chart SMEs |
description |
The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadísticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering. |
publishDate |
2019 |
dc.date.accessioned.none.fl_str_mv |
2019-06-10T13:50:20Z |
dc.date.available.none.fl_str_mv |
2019-06-10T13:50:20Z |
dc.date.issued.none.fl_str_mv |
2019 |
dc.type.spa.fl_str_mv |
Artículo de revista |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/ART |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
format |
http://purl.org/coar/resource_type/c_6501 |
status_str |
acceptedVersion |
dc.identifier.issn.spa.fl_str_mv |
0000-2010 |
dc.identifier.uri.spa.fl_str_mv |
https://hdl.handle.net/11323/4834 |
dc.identifier.instname.spa.fl_str_mv |
Corporación Universidad de la Costa |
dc.identifier.reponame.spa.fl_str_mv |
REDICUC - Repositorio CUC |
dc.identifier.repourl.spa.fl_str_mv |
https://repositorio.cuc.edu.co/ |
identifier_str_mv |
0000-2010 Corporación Universidad de la Costa REDICUC - Repositorio CUC |
url |
https://hdl.handle.net/11323/4834 https://repositorio.cuc.edu.co/ |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.references.spa.fl_str_mv |
[1] Amelec, V. (2015). Increased efficiency in a company of development of technological solutions in the areas commercial and of consultancy. Advanced Science Letters, 21(5), 1406-1408. [2] Lis-Gutiérrez M., Gaitán-Angulo M., Balaguera MI., Viloria A., Santander-Abril JE. (2018) Use of the Industrial Property System for New Creations in Colombia: A Departmental Analysis (2000–2016). In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham [3] Bartels, F.; Koria, R. 2014. Mapping, measuring and managing African national systems of innovation for policy and development: the case of the Ghana national system of innovation. African J. Science, Technol., Innov. Developm. 6(5):383-400. [4] DANE. 2017. Documento metodológico encuesta de desarrollo e innovación tecnológica en la industria Manufacturera. Bogotá: DANE. 43p. [5] Jolliffe, I. 2002. Principal component analysis. Hoboken: John Wiley & Sons, 488p. [6] Chakraborty, S., Das, S., 2018. Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm. Stat. Probab. Lett. 137, 148– 156. https://doi.org/10.1016/j.spl.2018.01.015 [7] Garcia, A.J., Flores, W.G., 2016. Automatic Clustering Using Nature-Inspired Metaheuristics: A Survey. Appl. Soft Comput. https://doi.org/10.1016/j.asoc.2015.12.001 [8] Das, S., Abraham, A., Konar, A., 2008. Automatic Clustering Using an Improved Differential Evolution Algorithm. IEEE Trans. Syst. Man, Cybern. - Part A Syst. Humans 38, 218–237. https://doi.org/10.1109/TSMCA.2007.909595 [9] Ramadas, M., Abraham, A., Kumar, S., 2016. FSDE-Forced Strategy Differential Evolution used for data clustering. J. King Saud Univ. - Comput. Inf. Sci. https://doi.org/10.1016/j.jksuci.2016.12.005. [10] Kuo, R.., Suryani Erma, Yasid, A., 2013. Automatic Clustering Combining Differential Evolution Algorithm and k-Means Algorithm. Proc. Inst. Ind. Eng. Asian Conf. 2013 1207–1215. https://doi.org/10.1007/978-981-4451-98-7 [11] Kaya, I., 2009. A genetic algorithm approach to determine the sample size for attribute control charts. Inf. Sci. (Ny). 179, 1552–1566. https://doi.org/10.1016/j.ins.2008.09.024 [12] Tam, H., Ng, S., Lui, A.K., Leung, M., 2017. Improved Activation Schema on Automatic Clustering Using Differential Evolution Algorithm. IEEE Congr. Evol. Comput. 1749–1756. https://doi.org/10.1109/CEC.2017.7969513 [13] Kamatkar S.J., Tayade A., Viloria A., Hernández-Chacín A. (2018) Application of Classification Technique of Data Mining for Employee Management System. In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham [14] Varela Izquierdo N., Cabrera H.R., Lopez Carvajal G., Viloria A., Gaitán Angulo M., Henry MA. (2018) Methodology for the Reduction and Integration of Data in the Performance Measurement of Industries Cement Plants. In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham. [15] Gaitán-Angulo M. Jairo Enrique Santander Abril, Amelec Viloria, Julio Mojica Herazo, Pedro Hernández Malpica, Jairo Luis Martínez Ventura, Lissette Hernández-Fernández. (2018). Company Family, Innovation and Colombian Graphic Industry: A Bayesian Estimation of a Logistical Model. In: T |
dc.rights.spa.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.publisher.spa.fl_str_mv |
Procedia Computer Science |
institution |
Corporación Universidad de la Costa |
bitstream.url.fl_str_mv |
https://repositorio.cuc.edu.co/bitstreams/ba95a06b-d750-4d9e-8aa5-346e4e6a2b68/download https://repositorio.cuc.edu.co/bitstreams/211e211e-46d9-4880-bc1a-b61533bddc75/download https://repositorio.cuc.edu.co/bitstreams/c108a77d-5e2a-4bc6-9ba3-5b7dae20ca81/download https://repositorio.cuc.edu.co/bitstreams/7155b4b1-111c-4c8e-b11e-f2e8833b110e/download https://repositorio.cuc.edu.co/bitstreams/99b0856d-c6d6-483c-ac3a-45c83fed3bc2/download |
bitstream.checksum.fl_str_mv |
cc71e90410b1bbc8b5c9f3703400ea22 4460e5956bc1d1639be9ae6146a50347 8a4605be74aa9ea9d79846c1fba20a33 f7ecebd5ab848d06f6c8cc74d01f2058 89f07be103cc954d110aaedcf4c21714 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio de la Universidad de la Costa CUC |
repository.mail.fl_str_mv |
repdigital@cuc.edu.co |
_version_ |
1828166841975963648 |
spelling |
amelec, viloriaPineda Lezama, Omar Bonerge2019-06-10T13:50:20Z2019-06-10T13:50:20Z20190000-2010https://hdl.handle.net/11323/4834Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/The Automatic Clustering using Differential Evolution (ACDE) is one of the grouping methods capable of automatically determining the number of the cluster. However, ACDE continues making use of the strategy manual to determine the activation threshold of k, which affects its performance. In this study, the problem of ACDE is enhanced using the U Control Chart (UCC). The performance of the proposed method was tested using five data sets from the National Administrative Department of Statistics (DANE - Departamento Administrativo Nacional de Estadísticas) and the Ministry of Commerce, Industry, and Tourism of Colombia for the innovative capacity of Small and Medium-sized Enterprises (SMEs) and were assessed by the Davies Bouldin Index (DBI) and the Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches for most datasets with optimal cluster number yet lowest DBI and CS measure. It can be concluded that the UCC method is able to determine k activation threshold in ACDE that caused effective determination of the cluster number for k-means clustering.amelec, viloria-6305b089-116f-431f-9ab3-52b11c6194dd-600Pineda Lezama, Omar Bonerge-365a03a0-145e-4df5-9abe-f5ccf9d96612-0engProcedia Computer ScienceAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2k-meansautomatic clusteringdifferential evolutionk activation thresholdU-Control ChartSMEsImprovements for determining the number of clusters in k-means for innovation databases in SMEsArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersion[1] Amelec, V. (2015). Increased efficiency in a company of development of technological solutions in the areas commercial and of consultancy. Advanced Science Letters, 21(5), 1406-1408. [2] Lis-Gutiérrez M., Gaitán-Angulo M., Balaguera MI., Viloria A., Santander-Abril JE. (2018) Use of the Industrial Property System for New Creations in Colombia: A Departmental Analysis (2000–2016). In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham [3] Bartels, F.; Koria, R. 2014. Mapping, measuring and managing African national systems of innovation for policy and development: the case of the Ghana national system of innovation. African J. Science, Technol., Innov. Developm. 6(5):383-400. [4] DANE. 2017. Documento metodológico encuesta de desarrollo e innovación tecnológica en la industria Manufacturera. Bogotá: DANE. 43p. [5] Jolliffe, I. 2002. Principal component analysis. Hoboken: John Wiley & Sons, 488p. [6] Chakraborty, S., Das, S., 2018. Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm. Stat. Probab. Lett. 137, 148– 156. https://doi.org/10.1016/j.spl.2018.01.015 [7] Garcia, A.J., Flores, W.G., 2016. Automatic Clustering Using Nature-Inspired Metaheuristics: A Survey. Appl. Soft Comput. https://doi.org/10.1016/j.asoc.2015.12.001 [8] Das, S., Abraham, A., Konar, A., 2008. Automatic Clustering Using an Improved Differential Evolution Algorithm. IEEE Trans. Syst. Man, Cybern. - Part A Syst. Humans 38, 218–237. https://doi.org/10.1109/TSMCA.2007.909595 [9] Ramadas, M., Abraham, A., Kumar, S., 2016. FSDE-Forced Strategy Differential Evolution used for data clustering. J. King Saud Univ. - Comput. Inf. Sci. https://doi.org/10.1016/j.jksuci.2016.12.005. [10] Kuo, R.., Suryani Erma, Yasid, A., 2013. Automatic Clustering Combining Differential Evolution Algorithm and k-Means Algorithm. Proc. Inst. Ind. Eng. Asian Conf. 2013 1207–1215. https://doi.org/10.1007/978-981-4451-98-7 [11] Kaya, I., 2009. A genetic algorithm approach to determine the sample size for attribute control charts. Inf. Sci. (Ny). 179, 1552–1566. https://doi.org/10.1016/j.ins.2008.09.024 [12] Tam, H., Ng, S., Lui, A.K., Leung, M., 2017. Improved Activation Schema on Automatic Clustering Using Differential Evolution Algorithm. IEEE Congr. Evol. Comput. 1749–1756. https://doi.org/10.1109/CEC.2017.7969513 [13] Kamatkar S.J., Tayade A., Viloria A., Hernández-Chacín A. (2018) Application of Classification Technique of Data Mining for Employee Management System. In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham [14] Varela Izquierdo N., Cabrera H.R., Lopez Carvajal G., Viloria A., Gaitán Angulo M., Henry MA. (2018) Methodology for the Reduction and Integration of Data in the Performance Measurement of Industries Cement Plants. In: Tan Y., Shi Y., Tang Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science, vol 10943. Springer, Cham. [15] Gaitán-Angulo M. Jairo Enrique Santander Abril, Amelec Viloria, Julio Mojica Herazo, Pedro Hernández Malpica, Jairo Luis Martínez Ventura, Lissette Hernández-Fernández. (2018). Company Family, Innovation and Colombian Graphic Industry: A Bayesian Estimation of a Logistical Model. In: TPublicationORIGINALImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdfImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdfapplication/pdf457383https://repositorio.cuc.edu.co/bitstreams/ba95a06b-d750-4d9e-8aa5-346e4e6a2b68/downloadcc71e90410b1bbc8b5c9f3703400ea22MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.cuc.edu.co/bitstreams/211e211e-46d9-4880-bc1a-b61533bddc75/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://repositorio.cuc.edu.co/bitstreams/c108a77d-5e2a-4bc6-9ba3-5b7dae20ca81/download8a4605be74aa9ea9d79846c1fba20a33MD53THUMBNAILImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdf.jpgImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdf.jpgimage/jpeg44211https://repositorio.cuc.edu.co/bitstreams/7155b4b1-111c-4c8e-b11e-f2e8833b110e/downloadf7ecebd5ab848d06f6c8cc74d01f2058MD55TEXTImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdf.txtImprovements for Determining the Number of Clusters in k-Means for Innovation Databases in SMEs.pdf.txttext/plain27253https://repositorio.cuc.edu.co/bitstreams/99b0856d-c6d6-483c-ac3a-45c83fed3bc2/download89f07be103cc954d110aaedcf4c21714MD5611323/4834oai:repositorio.cuc.edu.co:11323/48342024-09-17 14:16:58.285http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accesshttps://repositorio.cuc.edu.coRepositorio de la Universidad de la Costa CUCrepdigital@cuc.edu.coTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo= |