Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides

This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hu...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2016
Institución:
Universidad Pedagógica y Tecnológica de Colombia
Repositorio:
RiUPTC: Repositorio Institucional UPTC
Idioma:
eng
OAI Identifier:
oai:repositorio.uptc.edu.co:001/14173
Acceso en línea:
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
https://repositorio.uptc.edu.co/handle/001/14173
Palabra clave:
antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
Rights
License
http://purl.org/coar/access_right/c_abf191
id REPOUPTC2_8ce798a141096354b13c74dab3739c6d
oai_identifier_str oai:repositorio.uptc.edu.co:001/14173
network_acronym_str REPOUPTC2
network_name_str RiUPTC: Repositorio Institucional UPTC
repository_id_str
spelling 2016-12-312024-07-05T19:11:31Z2024-07-05T19:11:31Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/583410.19053/01211129.v26.n44.2017.5834https://repositorio.uptc.edu.co/handle/001/14173This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.application/pdfapplication/xmlengengUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-1802357-53280121-1129antimicrobial peptideslearning curvesmachine learningstatistical stabilitysupport vector regressionAssessing the behavior of machine learning methods to predict the activity of antimicrobial peptidesinvestigationinfo:eu-repo/semantics/articlehttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a274http://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf191http://purl.org/coar/access_right/c_abf2Camacho, Francy LilianaTorres-Sáez, RodrigoRamos-Pollán, Raúl001/14173oai:repositorio.uptc.edu.co:001/141732025-07-18 11:53:37.658metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co
dc.title.en-US.fl_str_mv Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
spellingShingle Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
title_short Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_fullStr Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full_unstemmed Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_sort Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
dc.subject.en-US.fl_str_mv antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
topic antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
description This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.
publishDate 2016
dc.date.accessioned.none.fl_str_mv 2024-07-05T19:11:31Z
dc.date.available.none.fl_str_mv 2024-07-05T19:11:31Z
dc.date.none.fl_str_mv 2016-12-31
dc.type.en-US.fl_str_mv investigation
dc.type.none.fl_str_mv info:eu-repo/semantics/article
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/publishedVersion
dc.type.coarversion.spa.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a274
status_str publishedVersion
dc.identifier.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
10.19053/01211129.v26.n44.2017.5834
dc.identifier.uri.none.fl_str_mv https://repositorio.uptc.edu.co/handle/001/14173
url https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
https://repositorio.uptc.edu.co/handle/001/14173
identifier_str_mv 10.19053/01211129.v26.n44.2017.5834
dc.language.none.fl_str_mv eng
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf191
rights_invalid_str_mv http://purl.org/coar/access_right/c_abf191
http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv application/pdf
application/xml
dc.publisher.en-US.fl_str_mv Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180
dc.source.es-ES.fl_str_mv Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-180
dc.source.none.fl_str_mv 2357-5328
0121-1129
institution Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv Repositorio Institucional UPTC
repository.mail.fl_str_mv repositorio.uptc@uptc.edu.co
_version_ 1839633832964259840