Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hu...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2016
- Institución:
- Universidad Pedagógica y Tecnológica de Colombia
- Repositorio:
- RiUPTC: Repositorio Institucional UPTC
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uptc.edu.co:001/14173
- Acceso en línea:
- https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834
https://repositorio.uptc.edu.co/handle/001/14173
- Palabra clave:
- antimicrobial peptides
learning curves
machine learning
statistical stability
support vector regression
- Rights
- License
- http://purl.org/coar/access_right/c_abf191
id |
REPOUPTC2_8ce798a141096354b13c74dab3739c6d |
---|---|
oai_identifier_str |
oai:repositorio.uptc.edu.co:001/14173 |
network_acronym_str |
REPOUPTC2 |
network_name_str |
RiUPTC: Repositorio Institucional UPTC |
repository_id_str |
|
spelling |
2016-12-312024-07-05T19:11:31Z2024-07-05T19:11:31Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/583410.19053/01211129.v26.n44.2017.5834https://repositorio.uptc.edu.co/handle/001/14173This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.application/pdfapplication/xmlengengUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-1802357-53280121-1129antimicrobial peptideslearning curvesmachine learningstatistical stabilitysupport vector regressionAssessing the behavior of machine learning methods to predict the activity of antimicrobial peptidesinvestigationinfo:eu-repo/semantics/articlehttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/publishedVersionhttp://purl.org/coar/version/c_970fb48d4fbd8a274http://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf191http://purl.org/coar/access_right/c_abf2Camacho, Francy LilianaTorres-Sáez, RodrigoRamos-Pollán, Raúl001/14173oai:repositorio.uptc.edu.co:001/141732025-07-18 11:53:37.658metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co |
dc.title.en-US.fl_str_mv |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
title |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
spellingShingle |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides antimicrobial peptides learning curves machine learning statistical stability support vector regression |
title_short |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
title_full |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
title_fullStr |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
title_full_unstemmed |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
title_sort |
Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides |
dc.subject.en-US.fl_str_mv |
antimicrobial peptides learning curves machine learning statistical stability support vector regression |
topic |
antimicrobial peptides learning curves machine learning statistical stability support vector regression |
description |
This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability. |
publishDate |
2016 |
dc.date.accessioned.none.fl_str_mv |
2024-07-05T19:11:31Z |
dc.date.available.none.fl_str_mv |
2024-07-05T19:11:31Z |
dc.date.none.fl_str_mv |
2016-12-31 |
dc.type.en-US.fl_str_mv |
investigation |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/publishedVersion |
dc.type.coarversion.spa.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a274 |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834 10.19053/01211129.v26.n44.2017.5834 |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.uptc.edu.co/handle/001/14173 |
url |
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834 https://repositorio.uptc.edu.co/handle/001/14173 |
identifier_str_mv |
10.19053/01211129.v26.n44.2017.5834 |
dc.language.none.fl_str_mv |
eng |
dc.language.iso.spa.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf191 |
rights_invalid_str_mv |
http://purl.org/coar/access_right/c_abf191 http://purl.org/coar/access_right/c_abf2 |
dc.format.none.fl_str_mv |
application/pdf application/xml |
dc.publisher.en-US.fl_str_mv |
Universidad Pedagógica y Tecnológica de Colombia |
dc.source.en-US.fl_str_mv |
Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180 |
dc.source.es-ES.fl_str_mv |
Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-180 |
dc.source.none.fl_str_mv |
2357-5328 0121-1129 |
institution |
Universidad Pedagógica y Tecnológica de Colombia |
repository.name.fl_str_mv |
Repositorio Institucional UPTC |
repository.mail.fl_str_mv |
repositorio.uptc@uptc.edu.co |
_version_ |
1839633832964259840 |