Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition
Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors suc...
- Autores:
-
Neira Rodado, Dionicio
Nugent, Christopher
Cleland, Ian
Velasquez, Javier
Viloria, Amelec
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2020
- Institución:
- Corporación Universidad de la Costa
- Repositorio:
- REDICUC - Repositorio CUC
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.cuc.edu.co:11323/7755
- Acceso en línea:
- https://hdl.handle.net/11323/7755
https://repositorio.cuc.edu.co/
- Palabra clave:
- HAR
dataset quality
machine learning
multivariate analysis
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
id |
RCUC2_c96fed50c983075f7031af4d02f0a454 |
---|---|
oai_identifier_str |
oai:repositorio.cuc.edu.co:11323/7755 |
network_acronym_str |
RCUC2 |
network_name_str |
REDICUC - Repositorio CUC |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
title |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
spellingShingle |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition HAR dataset quality machine learning multivariate analysis |
title_short |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
title_full |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
title_fullStr |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
title_full_unstemmed |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
title_sort |
Evaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognition |
dc.creator.fl_str_mv |
Neira Rodado, Dionicio Nugent, Christopher Cleland, Ian Velasquez, Javier Viloria, Amelec |
dc.contributor.author.spa.fl_str_mv |
Neira Rodado, Dionicio Nugent, Christopher Cleland, Ian Velasquez, Javier Viloria, Amelec |
dc.subject.spa.fl_str_mv |
HAR dataset quality machine learning multivariate analysis |
topic |
HAR dataset quality machine learning multivariate analysis |
description |
Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%. |
publishDate |
2020 |
dc.date.issued.none.fl_str_mv |
2020 |
dc.date.accessioned.none.fl_str_mv |
2021-01-22T23:41:36Z |
dc.date.available.none.fl_str_mv |
2021-01-22T23:41:36Z |
dc.type.spa.fl_str_mv |
Artículo de revista |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/ART |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
format |
http://purl.org/coar/resource_type/c_6501 |
status_str |
acceptedVersion |
dc.identifier.uri.spa.fl_str_mv |
https://hdl.handle.net/11323/7755 |
dc.identifier.doi.spa.fl_str_mv |
10.3390/s20071858 |
dc.identifier.instname.spa.fl_str_mv |
Corporación Universidad de la Costa |
dc.identifier.reponame.spa.fl_str_mv |
REDICUC - Repositorio CUC |
dc.identifier.repourl.spa.fl_str_mv |
https://repositorio.cuc.edu.co/ |
url |
https://hdl.handle.net/11323/7755 https://repositorio.cuc.edu.co/ |
identifier_str_mv |
10.3390/s20071858 Corporación Universidad de la Costa REDICUC - Repositorio CUC |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.references.spa.fl_str_mv |
Prins J., Mader D. Multivariate Control Charts for Grouped and Individual Observations. Manuf. Syst. 2007:37–41. doi: 10.1080/08982119708919108 Fallon A., Spada C. Detection and accommodation of outliers. Environ. Sampl. Monit. Primer. 1997;6:217–230. Mahmoud S., Lotfi A., Langensiepen C. User activities outliers detection; Integration of statistical and computational intelligence techniques. Comput. Intell. 2016;32:49–71. doi: 10.1111/coin.12045. Aparisi F., Carlos J., Díaz G. Aumento de la potencia del gráfico de control multivariante T 2 de Hotelling utilizando señales adicionales de falta de control. Estadística Española. 2001;43:171–188. Bauder R.A., Khoshgoftaar T.M. Multivariate anomaly detection in medicare using model residuals and probabilistic programming; Proceedings of the FLAIRS 2017—30th International Florida Artificial Intelligence Research Society Conference; Marco Island, FL, USA. 22–24 May 2017; pp. 417–422. |
dc.rights.spa.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Corporación Universidad de la Costa |
dc.source.spa.fl_str_mv |
Sensors (Basel) |
institution |
Corporación Universidad de la Costa |
dc.source.url.spa.fl_str_mv |
https://pubmed.ncbi.nlm.nih.gov/32230844/ |
bitstream.url.fl_str_mv |
https://repositorio.cuc.edu.co/bitstreams/2cc02fba-e17a-4008-b001-5e4f355fe0ed/download https://repositorio.cuc.edu.co/bitstreams/3f707d8f-d7ec-4fa4-b7b9-7c7b3727f08e/download https://repositorio.cuc.edu.co/bitstreams/cc1880d9-df8e-458a-9bb1-884812d43df4/download https://repositorio.cuc.edu.co/bitstreams/c25ff041-5404-4c40-9b71-d8ec356c9ab2/download https://repositorio.cuc.edu.co/bitstreams/4eb8c20b-00cc-4eb6-8cae-5ed0db20d0ab/download |
bitstream.checksum.fl_str_mv |
c171f1063ad85556688b2b2bb1359d31 4460e5956bc1d1639be9ae6146a50347 e30e9215131d99561d40d6b0abbe9bad 8611f657ae73c900e653e62d734660c3 bc5b389e7e0f6d8c751000d94bbaa732 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio de la Universidad de la Costa CUC |
repository.mail.fl_str_mv |
repdigital@cuc.edu.co |
_version_ |
1828166911455657984 |
spelling |
Neira Rodado, DionicioNugent, ChristopherCleland, IanVelasquez, JavierViloria, Amelec2021-01-22T23:41:36Z2021-01-22T23:41:36Z2020https://hdl.handle.net/11323/775510.3390/s20071858Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/Human activity recognition (HAR) is a popular field of study. The outcomes of the projects in this area have the potential to impact on the quality of life of people with conditions such as dementia. HAR is focused primarily on applying machine learning classifiers on data from low level sensors such as accelerometers. The performance of these classifiers can be improved through an adequate training process. In order to improve the training process, multivariate outlier detection was used in order to improve the quality of data in the training set and, subsequently, performance of the classifier. The impact of the technique was evaluated with KNN and random forest (RF) classifiers. In the case of KNN, the performance of the classifier was improved from 55.9% to 63.59%.Neira Rodado, Dionicio-will be generated-orcid-0000-0003-0837-7083-600Nugent, Christopher-will be generated-orcid-0000-0001-6295-8669-600Cleland, IanVelasquez, JavierViloria, Amelecapplication/pdfengCorporación Universidad de la CostaAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Sensors (Basel)https://pubmed.ncbi.nlm.nih.gov/32230844/HARdataset qualitymachine learningmultivariate analysisEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers: a case study in human activity recognitionArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersionPrins J., Mader D. Multivariate Control Charts for Grouped and Individual Observations. Manuf. Syst. 2007:37–41. doi: 10.1080/08982119708919108Fallon A., Spada C. Detection and accommodation of outliers. Environ. Sampl. Monit. Primer. 1997;6:217–230.Mahmoud S., Lotfi A., Langensiepen C. User activities outliers detection; Integration of statistical and computational intelligence techniques. Comput. Intell. 2016;32:49–71. doi: 10.1111/coin.12045.Aparisi F., Carlos J., Díaz G. Aumento de la potencia del gráfico de control multivariante T 2 de Hotelling utilizando señales adicionales de falta de control. Estadística Española. 2001;43:171–188.Bauder R.A., Khoshgoftaar T.M. Multivariate anomaly detection in medicare using model residuals and probabilistic programming; Proceedings of the FLAIRS 2017—30th International Florida Artificial Intelligence Research Society Conference; Marco Island, FL, USA. 22–24 May 2017; pp. 417–422.PublicationORIGINALEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdfEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdfapplication/pdf134441https://repositorio.cuc.edu.co/bitstreams/2cc02fba-e17a-4008-b001-5e4f355fe0ed/downloadc171f1063ad85556688b2b2bb1359d31MD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.cuc.edu.co/bitstreams/3f707d8f-d7ec-4fa4-b7b9-7c7b3727f08e/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-83196https://repositorio.cuc.edu.co/bitstreams/cc1880d9-df8e-458a-9bb1-884812d43df4/downloade30e9215131d99561d40d6b0abbe9badMD53THUMBNAILEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.jpgEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.jpgimage/jpeg32626https://repositorio.cuc.edu.co/bitstreams/c25ff041-5404-4c40-9b71-d8ec356c9ab2/download8611f657ae73c900e653e62d734660c3MD54TEXTEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.txtEvaluating the impact of a two-stage multivariate data cleansing approach to improve to the performance of machine learning classifiers a case study in human activity recognition.pdf.txttext/plain1167https://repositorio.cuc.edu.co/bitstreams/4eb8c20b-00cc-4eb6-8cae-5ed0db20d0ab/downloadbc5b389e7e0f6d8c751000d94bbaa732MD5511323/7755oai:repositorio.cuc.edu.co:11323/77552024-09-17 14:25:20.234http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accesshttps://repositorio.cuc.edu.coRepositorio de la Universidad de la Costa CUCrepdigital@cuc.edu.coQXV0b3Jpem8gKGF1dG9yaXphbW9zKSBhIGxhIEJpYmxpb3RlY2EgZGUgbGEgSW5zdGl0dWNpw7NuIHBhcmEgcXVlIGluY2x1eWEgdW5hIGNvcGlhLCBpbmRleGUgeSBkaXZ1bGd1ZSBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsLCBsYSBvYnJhIG1lbmNpb25hZGEgY29uIGVsIGZpbiBkZSBmYWNpbGl0YXIgbG9zIHByb2Nlc29zIGRlIHZpc2liaWxpZGFkIGUgaW1wYWN0byBkZSBsYSBtaXNtYSwgY29uZm9ybWUgYSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBxdWUgbWUobm9zKSBjb3JyZXNwb25kZShuKSB5IHF1ZSBpbmNsdXllbjogbGEgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgZGlzdHJpYnVjacOzbiBhbCBww7pibGljbywgdHJhbnNmb3JtYWNpw7NuLCBkZSBjb25mb3JtaWRhZCBjb24gbGEgbm9ybWF0aXZpZGFkIHZpZ2VudGUgc29icmUgZGVyZWNob3MgZGUgYXV0b3IgeSBkZXJlY2hvcyBjb25leG9zIHJlZmVyaWRvcyBlbiBhcnQuIDIsIDEyLCAzMCAobW9kaWZpY2FkbyBwb3IgZWwgYXJ0IDUgZGUgbGEgbGV5IDE1MjAvMjAxMiksIHkgNzIgZGUgbGEgbGV5IDIzIGRlIGRlIDE5ODIsIExleSA0NCBkZSAxOTkzLCBhcnQuIDQgeSAxMSBEZWNpc2nDs24gQW5kaW5hIDM1MSBkZSAxOTkzIGFydC4gMTEsIERlY3JldG8gNDYwIGRlIDE5OTUsIENpcmN1bGFyIE5vIDA2LzIwMDIgZGUgbGEgRGlyZWNjacOzbiBOYWNpb25hbCBkZSBEZXJlY2hvcyBkZSBhdXRvciwgYXJ0LiAxNSBMZXkgMTUyMCBkZSAyMDEyLCBsYSBMZXkgMTkxNSBkZSAyMDE4IHkgZGVtw6FzIG5vcm1hcyBzb2JyZSBsYSBtYXRlcmlhLg0KDQpBbCByZXNwZWN0byBjb21vIEF1dG9yKGVzKSBtYW5pZmVzdGFtb3MgY29ub2NlciBxdWU6DQoNCi0gTGEgYXV0b3JpemFjacOzbiBlcyBkZSBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZhIHkgbGltaXRhZGEsIGVzdG8gaW1wbGljYSBxdWUgbGEgbGljZW5jaWEgdGllbmUgdW5hIHZpZ2VuY2lhLCBxdWUgbm8gZXMgcGVycGV0dWEgeSBxdWUgZWwgYXV0b3IgcHVlZGUgcHVibGljYXIgbyBkaWZ1bmRpciBzdSBvYnJhIGVuIGN1YWxxdWllciBvdHJvIG1lZGlvLCBhc8OtIGNvbW8gbGxldmFyIGEgY2FibyBjdWFscXVpZXIgdGlwbyBkZSBhY2Npw7NuIHNvYnJlIGVsIGRvY3VtZW50by4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIHRlbmRyw6EgdW5hIHZpZ2VuY2lhIGRlIGNpbmNvIGHDsW9zIGEgcGFydGlyIGRlbCBtb21lbnRvIGRlIGxhIGluY2x1c2nDs24gZGUgbGEgb2JyYSBlbiBlbCByZXBvc2l0b3JpbywgcHJvcnJvZ2FibGUgaW5kZWZpbmlkYW1lbnRlIHBvciBlbCB0aWVtcG8gZGUgZHVyYWNpw7NuIGRlIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlbCBhdXRvciB5IHBvZHLDoSBkYXJzZSBwb3IgdGVybWluYWRhIHVuYSB2ZXogZWwgYXV0b3IgbG8gbWFuaWZpZXN0ZSBwb3IgZXNjcml0byBhIGxhIGluc3RpdHVjacOzbiwgY29uIGxhIHNhbHZlZGFkIGRlIHF1ZSBsYSBvYnJhIGVzIGRpZnVuZGlkYSBnbG9iYWxtZW50ZSB5IGNvc2VjaGFkYSBwb3IgZGlmZXJlbnRlcyBidXNjYWRvcmVzIHkvbyByZXBvc2l0b3Jpb3MgZW4gSW50ZXJuZXQgbG8gcXVlIG5vIGdhcmFudGl6YSBxdWUgbGEgb2JyYSBwdWVkYSBzZXIgcmV0aXJhZGEgZGUgbWFuZXJhIGlubWVkaWF0YSBkZSBvdHJvcyBzaXN0ZW1hcyBkZSBpbmZvcm1hY2nDs24gZW4gbG9zIHF1ZSBzZSBoYXlhIGluZGV4YWRvLCBkaWZlcmVudGVzIGFsIHJlcG9zaXRvcmlvIGluc3RpdHVjaW9uYWwgZGUgbGEgSW5zdGl0dWNpw7NuLCBkZSBtYW5lcmEgcXVlIGVsIGF1dG9yKHJlcykgdGVuZHLDoW4gcXVlIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBzdSBvYnJhIGRpcmVjdGFtZW50ZSBhIG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBkaXN0aW50b3MgYWwgZGUgbGEgSW5zdGl0dWNpw7NuIHNpIGRlc2VhIHF1ZSBzdSBvYnJhIHNlYSByZXRpcmFkYSBkZSBpbm1lZGlhdG8uDQoNCi0gTGEgYXV0b3JpemFjacOzbiBkZSBwdWJsaWNhY2nDs24gY29tcHJlbmRlIGVsIGZvcm1hdG8gb3JpZ2luYWwgZGUgbGEgb2JyYSB5IHRvZG9zIGxvcyBkZW3DoXMgcXVlIHNlIHJlcXVpZXJhIHBhcmEgc3UgcHVibGljYWNpw7NuIGVuIGVsIHJlcG9zaXRvcmlvLiBJZ3VhbG1lbnRlLCBsYSBhdXRvcml6YWNpw7NuIHBlcm1pdGUgYSBsYSBpbnN0aXR1Y2nDs24gZWwgY2FtYmlvIGRlIHNvcG9ydGUgZGUgbGEgb2JyYSBjb24gZmluZXMgZGUgcHJlc2VydmFjacOzbiAoaW1wcmVzbywgZWxlY3Ryw7NuaWNvLCBkaWdpdGFsLCBJbnRlcm5ldCwgaW50cmFuZXQsIG8gY3VhbHF1aWVyIG90cm8gZm9ybWF0byBjb25vY2lkbyBvIHBvciBjb25vY2VyKS4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIGVzIGdyYXR1aXRhIHkgc2UgcmVudW5jaWEgYSByZWNpYmlyIGN1YWxxdWllciByZW11bmVyYWNpw7NuIHBvciBsb3MgdXNvcyBkZSBsYSBvYnJhLCBkZSBhY3VlcmRvIGNvbiBsYSBsaWNlbmNpYSBlc3RhYmxlY2lkYSBlbiBlc3RhIGF1dG9yaXphY2nDs24uDQoNCi0gQWwgZmlybWFyIGVzdGEgYXV0b3JpemFjacOzbiwgc2UgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBlcyBvcmlnaW5hbCB5IG5vIGV4aXN0ZSBlbiBlbGxhIG5pbmd1bmEgdmlvbGFjacOzbiBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcy4gRW4gY2FzbyBkZSBxdWUgZWwgdHJhYmFqbyBoYXlhIHNpZG8gZmluYW5jaWFkbyBwb3IgdGVyY2Vyb3MgZWwgbyBsb3MgYXV0b3JlcyBhc3VtZW4gbGEgcmVzcG9uc2FiaWxpZGFkIGRlbCBjdW1wbGltaWVudG8gZGUgbG9zIGFjdWVyZG9zIGVzdGFibGVjaWRvcyBzb2JyZSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBkZSBsYSBvYnJhIGNvbiBkaWNobyB0ZXJjZXJvLg0KDQotIEZyZW50ZSBhIGN1YWxxdWllciByZWNsYW1hY2nDs24gcG9yIHRlcmNlcm9zLCBlbCBvIGxvcyBhdXRvcmVzIHNlcsOhbiByZXNwb25zYWJsZXMsIGVuIG5pbmfDum4gY2FzbyBsYSByZXNwb25zYWJpbGlkYWQgc2Vyw6EgYXN1bWlkYSBwb3IgbGEgaW5zdGl0dWNpw7NuLg0KDQotIENvbiBsYSBhdXRvcml6YWNpw7NuLCBsYSBpbnN0aXR1Y2nDs24gcHVlZGUgZGlmdW5kaXIgbGEgb2JyYSBlbiDDrW5kaWNlcywgYnVzY2Fkb3JlcyB5IG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBxdWUgZmF2b3JlemNhbiBzdSB2aXNpYmlsaWRhZA== |