Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition

Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors suc...

Full description

Autores:
Neira Rodado, Dionicio
Nugent, Christopher
Cleland, Ian
Velasquez, Javier
Viloria, Amelec
Tipo de recurso:
Article of journal
Fecha de publicación:
2020
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/7755
Acceso en línea:
https://hdl.handle.net/11323/7755
https://repositorio.cuc.edu.co/
Palabra clave:
HAR
dataset quality
machine learning
multivariate analysis
Rights
openAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 International
id RCUC2_c96fed50c983075f7031af4d02f0a454
oai_identifier_str oai:repositorio.cuc.edu.co:11323/7755
network_acronym_str RCUC2
network_name_str REDICUC - Repositorio CUC
repository_id_str
dc.title.spa.fl_str_mv Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
title Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
spellingShingle Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
HAR
dataset quality
machine learning
multivariate analysis
title_short Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
title_full Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
title_fullStr Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
title_full_unstemmed Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
title_sort Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
dc.creator.fl_str_mv Neira Rodado, Dionicio
Nugent, Christopher
Cleland, Ian
Velasquez, Javier
Viloria, Amelec
dc.contributor.author.spa.fl_str_mv Neira Rodado, Dionicio
Nugent, Christopher
Cleland, Ian
Velasquez, Javier
Viloria, Amelec
dc.subject.spa.fl_str_mv HAR
dataset quality
machine learning
multivariate analysis
topic HAR
dataset quality
machine learning
multivariate analysis
description Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%.
publishDate 2020
dc.date.issued.none.fl_str_mv 2020
dc.date.accessioned.none.fl_str_mv 2021-01-22T23:41:36Z
dc.date.available.none.fl_str_mv 2021-01-22T23:41:36Z
dc.type.spa.fl_str_mv Artículo de revista
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.content.spa.fl_str_mv Text
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/article
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/ART
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
format http://purl.org/coar/resource_type/c_6501
status_str acceptedVersion
dc.identifier.uri.spa.fl_str_mv https://hdl.handle.net/11323/7755
dc.identifier.doi.spa.fl_str_mv 10.3390/s20071858
dc.identifier.instname.spa.fl_str_mv Corporación Universidad de la Costa
dc.identifier.reponame.spa.fl_str_mv REDICUC - Repositorio CUC
dc.identifier.repourl.spa.fl_str_mv https://repositorio.cuc.edu.co/
url https://hdl.handle.net/11323/7755
https://repositorio.cuc.edu.co/
identifier_str_mv 10.3390/s20071858
Corporación Universidad de la Costa
REDICUC - Repositorio CUC
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.references.spa.fl_str_mv Prins J., Mader D. Multivariate Control Charts for Grouped and Individual Observations. Manuf. Syst. 2007:37–41. doi: 10.1080/08982119708919108
Fallon A., Spada C. Detection and accommodation of outliers. Environ. Sampl. Monit. Primer. 1997;6:217–230.
Mahmoud S., Lotfi A., Langensiepen C. User activities outliers detection; Integration of statistical and computational intelligence techniques. Comput. Intell. 2016;32:49–71. doi: 10.1111/coin.12045.
Aparisi F., Carlos J., Díaz G. Aumento de la potencia del gráfico de control multivariante T 2 de Hotelling utilizando señales adicionales de falta de control. Estadística Española. 2001;43:171–188.
Bauder R.A., Khoshgoftaar T.M. Multivariate anomaly detection in medicare using model residuals and probabilistic programming; Proceedings of the FLAIRS 2017—30th International Florida Artificial Intelligence Research Society Conference; Marco Island, FL, USA. 22–24 May 2017; pp. 417–422.
dc.rights.spa.fl_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Corporación Universidad de la Costa
dc.source.spa.fl_str_mv Sensors (Basel)
institution Corporación Universidad de la Costa
dc.source.url.spa.fl_str_mv https://pubmed.ncbi.nlm.nih.gov/32230844/
bitstream.url.fl_str_mv https://repositorio.cuc.edu.co/bitstream/11323/7755/1/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdf
https://repositorio.cuc.edu.co/bitstream/11323/7755/2/license_rdf
https://repositorio.cuc.edu.co/bitstream/11323/7755/3/license.txt
https://repositorio.cuc.edu.co/bitstream/11323/7755/4/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdf.jpg
https://repositorio.cuc.edu.co/bitstream/11323/7755/5/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdf.txt
bitstream.checksum.fl_str_mv c171f1063ad85556688b2b2bb1359d31
4460e5956bc1d1639be9ae6146a50347
e30e9215131d99561d40d6b0abbe9bad
8611f657ae73c900e653e62d734660c3
bc5b389e7e0f6d8c751000d94bbaa732
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Universidad de La Costa
repository.mail.fl_str_mv bdigital@metabiblioteca.com
_version_ 1808400264720285696
spelling Neira Rodado, Dionicio3be1ac9c504a1282ad14d2dee803adcfNugent, Christophera4db7dba9c111f48b12e074a1d1f7e31Cleland, Ian52c86c94b09dc567ee2901bc5b1c5b7dVelasquez, Javierf43cf330f7b0be6da526ab1c383eedcdViloria, Amelecfc29d54ed3c7d39e34b3d61c512ace8f2021-01-22T23:41:36Z2021-01-22T23:41:36Z2020https://hdl.handle.net/11323/775510.3390/s20071858Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%.application/pdfengCorporación Universidad de la CostaAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Sensors (Basel)https://pubmed.ncbi.nlm.nih.gov/32230844/HARdataset qualitymachine learningmultivariate analysisEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognitionArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersionPrins J., Mader D. Multivariate Control Charts for Grouped and Individual Observations. Manuf. Syst. 2007:37–41. doi: 10.1080/08982119708919108Fallon A., Spada C. Detection and accommodation of outliers. Environ. Sampl. Monit. Primer. 1997;6:217–230.Mahmoud S., Lotfi A., Langensiepen C. User activities outliers detection; Integration of statistical and computational intelligence techniques. Comput. Intell. 2016;32:49–71. doi: 10.1111/coin.12045.Aparisi F., Carlos J., Díaz G. Aumento de la potencia del gráfico de control multivariante T 2 de Hotelling utilizando señales adicionales de falta de control. Estadística Española. 2001;43:171–188.Bauder R.A., Khoshgoftaar T.M. Multivariate anomaly detection in medicare using model residuals and probabilistic programming; Proceedings of the FLAIRS 2017—30th International Florida Artificial Intelligence Research Society Conference; Marco Island, FL, USA. 22–24 May 2017; pp. 417–422.ORIGINALEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdfEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdfapplication/pdf134441https://repositorio.cuc.edu.co/bitstream/11323/7755/1/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdfc171f1063ad85556688b2b2bb1359d31MD51open accessCC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.cuc.edu.co/bitstream/11323/7755/2/license_rdf4460e5956bc1d1639be9ae6146a50347MD52open accessLICENSElicense.txtlicense.txttext/plain; charset=utf-83196https://repositorio.cuc.edu.co/bitstream/11323/7755/3/license.txte30e9215131d99561d40d6b0abbe9badMD53open accessTHUMBNAILEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.jpgEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.jpgimage/jpeg32626https://repositorio.cuc.edu.co/bitstream/11323/7755/4/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdf.jpg8611f657ae73c900e653e62d734660c3MD54open accessTEXTEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.txtEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.txttext/plain1167https://repositorio.cuc.edu.co/bitstream/11323/7755/5/Evaluating%20the%20impact%20of%20a%20two-stage%20multivariate%20data%20cleansing%20approach%20to%20improve%20to%20the%20performance%20of%20machine%20learning%20classifiers%20a%20case%20study%20in%20human%20activity%20recognition.pdf.txtbc5b389e7e0f6d8c751000d94bbaa732MD55open access11323/7755oai:repositorio.cuc.edu.co:11323/77552023-12-14 17:47:34.715Attribution-NonCommercial-NoDerivatives 4.0 International|||http://creativecommons.org/licenses/by-nc-nd/4.0/open accessRepositorio Universidad de La Costabdigital@metabiblioteca.comQXV0b3Jpem8gKGF1dG9yaXphbW9zKSBhIGxhIEJpYmxpb3RlY2EgZGUgbGEgSW5zdGl0dWNpw7NuIHBhcmEgcXVlIGluY2x1eWEgdW5hIGNvcGlhLCBpbmRleGUgeSBkaXZ1bGd1ZSBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsLCBsYSBvYnJhIG1lbmNpb25hZGEgY29uIGVsIGZpbiBkZSBmYWNpbGl0YXIgbG9zIHByb2Nlc29zIGRlIHZpc2liaWxpZGFkIGUgaW1wYWN0byBkZSBsYSBtaXNtYSwgY29uZm9ybWUgYSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBxdWUgbWUobm9zKSBjb3JyZXNwb25kZShuKSB5IHF1ZSBpbmNsdXllbjogbGEgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgZGlzdHJpYnVjacOzbiBhbCBww7pibGljbywgdHJhbnNmb3JtYWNpw7NuLCBkZSBjb25mb3JtaWRhZCBjb24gbGEgbm9ybWF0aXZpZGFkIHZpZ2VudGUgc29icmUgZGVyZWNob3MgZGUgYXV0b3IgeSBkZXJlY2hvcyBjb25leG9zIHJlZmVyaWRvcyBlbiBhcnQuIDIsIDEyLCAzMCAobW9kaWZpY2FkbyBwb3IgZWwgYXJ0IDUgZGUgbGEgbGV5IDE1MjAvMjAxMiksIHkgNzIgZGUgbGEgbGV5IDIzIGRlIGRlIDE5ODIsIExleSA0NCBkZSAxOTkzLCBhcnQuIDQgeSAxMSBEZWNpc2nDs24gQW5kaW5hIDM1MSBkZSAxOTkzIGFydC4gMTEsIERlY3JldG8gNDYwIGRlIDE5OTUsIENpcmN1bGFyIE5vIDA2LzIwMDIgZGUgbGEgRGlyZWNjacOzbiBOYWNpb25hbCBkZSBEZXJlY2hvcyBkZSBhdXRvciwgYXJ0LiAxNSBMZXkgMTUyMCBkZSAyMDEyLCBsYSBMZXkgMTkxNSBkZSAyMDE4IHkgZGVtw6FzIG5vcm1hcyBzb2JyZSBsYSBtYXRlcmlhLg0KDQpBbCByZXNwZWN0byBjb21vIEF1dG9yKGVzKSBtYW5pZmVzdGFtb3MgY29ub2NlciBxdWU6DQoNCi0gTGEgYXV0b3JpemFjacOzbiBlcyBkZSBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZhIHkgbGltaXRhZGEsIGVzdG8gaW1wbGljYSBxdWUgbGEgbGljZW5jaWEgdGllbmUgdW5hIHZpZ2VuY2lhLCBxdWUgbm8gZXMgcGVycGV0dWEgeSBxdWUgZWwgYXV0b3IgcHVlZGUgcHVibGljYXIgbyBkaWZ1bmRpciBzdSBvYnJhIGVuIGN1YWxxdWllciBvdHJvIG1lZGlvLCBhc8OtIGNvbW8gbGxldmFyIGEgY2FibyBjdWFscXVpZXIgdGlwbyBkZSBhY2Npw7NuIHNvYnJlIGVsIGRvY3VtZW50by4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIHRlbmRyw6EgdW5hIHZpZ2VuY2lhIGRlIGNpbmNvIGHDsW9zIGEgcGFydGlyIGRlbCBtb21lbnRvIGRlIGxhIGluY2x1c2nDs24gZGUgbGEgb2JyYSBlbiBlbCByZXBvc2l0b3JpbywgcHJvcnJvZ2FibGUgaW5kZWZpbmlkYW1lbnRlIHBvciBlbCB0aWVtcG8gZGUgZHVyYWNpw7NuIGRlIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlbCBhdXRvciB5IHBvZHLDoSBkYXJzZSBwb3IgdGVybWluYWRhIHVuYSB2ZXogZWwgYXV0b3IgbG8gbWFuaWZpZXN0ZSBwb3IgZXNjcml0byBhIGxhIGluc3RpdHVjacOzbiwgY29uIGxhIHNhbHZlZGFkIGRlIHF1ZSBsYSBvYnJhIGVzIGRpZnVuZGlkYSBnbG9iYWxtZW50ZSB5IGNvc2VjaGFkYSBwb3IgZGlmZXJlbnRlcyBidXNjYWRvcmVzIHkvbyByZXBvc2l0b3Jpb3MgZW4gSW50ZXJuZXQgbG8gcXVlIG5vIGdhcmFudGl6YSBxdWUgbGEgb2JyYSBwdWVkYSBzZXIgcmV0aXJhZGEgZGUgbWFuZXJhIGlubWVkaWF0YSBkZSBvdHJvcyBzaXN0ZW1hcyBkZSBpbmZvcm1hY2nDs24gZW4gbG9zIHF1ZSBzZSBoYXlhIGluZGV4YWRvLCBkaWZlcmVudGVzIGFsIHJlcG9zaXRvcmlvIGluc3RpdHVjaW9uYWwgZGUgbGEgSW5zdGl0dWNpw7NuLCBkZSBtYW5lcmEgcXVlIGVsIGF1dG9yKHJlcykgdGVuZHLDoW4gcXVlIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBzdSBvYnJhIGRpcmVjdGFtZW50ZSBhIG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBkaXN0aW50b3MgYWwgZGUgbGEgSW5zdGl0dWNpw7NuIHNpIGRlc2VhIHF1ZSBzdSBvYnJhIHNlYSByZXRpcmFkYSBkZSBpbm1lZGlhdG8uDQoNCi0gTGEgYXV0b3JpemFjacOzbiBkZSBwdWJsaWNhY2nDs24gY29tcHJlbmRlIGVsIGZvcm1hdG8gb3JpZ2luYWwgZGUgbGEgb2JyYSB5IHRvZG9zIGxvcyBkZW3DoXMgcXVlIHNlIHJlcXVpZXJhIHBhcmEgc3UgcHVibGljYWNpw7NuIGVuIGVsIHJlcG9zaXRvcmlvLiBJZ3VhbG1lbnRlLCBsYSBhdXRvcml6YWNpw7NuIHBlcm1pdGUgYSBsYSBpbnN0aXR1Y2nDs24gZWwgY2FtYmlvIGRlIHNvcG9ydGUgZGUgbGEgb2JyYSBjb24gZmluZXMgZGUgcHJlc2VydmFjacOzbiAoaW1wcmVzbywgZWxlY3Ryw7NuaWNvLCBkaWdpdGFsLCBJbnRlcm5ldCwgaW50cmFuZXQsIG8gY3VhbHF1aWVyIG90cm8gZm9ybWF0byBjb25vY2lkbyBvIHBvciBjb25vY2VyKS4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIGVzIGdyYXR1aXRhIHkgc2UgcmVudW5jaWEgYSByZWNpYmlyIGN1YWxxdWllciByZW11bmVyYWNpw7NuIHBvciBsb3MgdXNvcyBkZSBsYSBvYnJhLCBkZSBhY3VlcmRvIGNvbiBsYSBsaWNlbmNpYSBlc3RhYmxlY2lkYSBlbiBlc3RhIGF1dG9yaXphY2nDs24uDQoNCi0gQWwgZmlybWFyIGVzdGEgYXV0b3JpemFjacOzbiwgc2UgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBlcyBvcmlnaW5hbCB5IG5vIGV4aXN0ZSBlbiBlbGxhIG5pbmd1bmEgdmlvbGFjacOzbiBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcy4gRW4gY2FzbyBkZSBxdWUgZWwgdHJhYmFqbyBoYXlhIHNpZG8gZmluYW5jaWFkbyBwb3IgdGVyY2Vyb3MgZWwgbyBsb3MgYXV0b3JlcyBhc3VtZW4gbGEgcmVzcG9uc2FiaWxpZGFkIGRlbCBjdW1wbGltaWVudG8gZGUgbG9zIGFjdWVyZG9zIGVzdGFibGVjaWRvcyBzb2JyZSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBkZSBsYSBvYnJhIGNvbiBkaWNobyB0ZXJjZXJvLg0KDQotIEZyZW50ZSBhIGN1YWxxdWllciByZWNsYW1hY2nDs24gcG9yIHRlcmNlcm9zLCBlbCBvIGxvcyBhdXRvcmVzIHNlcsOhbiByZXNwb25zYWJsZXMsIGVuIG5pbmfDum4gY2FzbyBsYSByZXNwb25zYWJpbGlkYWQgc2Vyw6EgYXN1bWlkYSBwb3IgbGEgaW5zdGl0dWNpw7NuLg0KDQotIENvbiBsYSBhdXRvcml6YWNpw7NuLCBsYSBpbnN0aXR1Y2nDs24gcHVlZGUgZGlmdW5kaXIgbGEgb2JyYSBlbiDDrW5kaWNlcywgYnVzY2Fkb3JlcyB5IG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBxdWUgZmF2b3JlemNhbiBzdSB2aXNpYmlsaWRhZA==