Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides

This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hu...

Full description

Autores:

Tipo de recurso:: Article of investigation

Fecha de publicación:: 2016

Institución:: Universidad Pedagógica y Tecnológica de Colombia

Repositorio:: RiUPTC: Repositorio Institucional UPTC

Idioma:: eng

id	REPOUPTC2_8ce798a141096354b13c74dab3739c6d
oai_identifier_str	oai:repositorio.uptc.edu.co:001/14173
network_acronym_str	REPOUPTC2
network_name_str	RiUPTC: Repositorio Institucional UPTC
repository_id_str
spelling	2016-12-312024-07-05T19:11:31Z2024-07-05T19:11:31Zhttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/583410.19053/01211129.v26.n44.2017.5834https://repositorio.uptc.edu.co/handle/001/14173This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.application/pdfapplication/xmlengengUniversidad Pedagógica y Tecnológica de Colombiahttps://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-1802357-53280121-1129antimicrobial peptideslearning curvesmachine learningstatistical stabilitysupport vector regressionAssessing the behavior of machine learning methods to predict the activity of antimicrobial peptidesArtículo de revistahttp://purl.org/coar/resource_type/c_2df8fbb1info:eu-repo/semantics/articleTexthttps://purl.org/redcol/resource_type/ARThttp://purl.org/coar/version/c_970fb48d4fbd8a274http://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/access_right/c_abf191http://purl.org/coar/access_right/c_abf2Camacho, Francy LilianaTorres-Sáez, RodrigoRamos-Pollán, RaúlPublication001/14173oai:repositorio.uptc.edu.co:001/141732025-10-30 19:33:09.526metadata.onlyhttps://repositorio.uptc.edu.coRepositorio Institucional UPTCrepositorio.uptc@uptc.edu.co
dc.title.en-US.fl_str_mv	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
spellingShingle	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides antimicrobial peptides learning curves machine learning statistical stability support vector regression
title_short	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_fullStr	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_full_unstemmed	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
title_sort	Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides
dc.subject.en-US.fl_str_mv	antimicrobial peptides learning curves machine learning statistical stability support vector regression
topic	antimicrobial peptides learning curves machine learning statistical stability support vector regression
description	This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting good peak performance obtained from very particular experimental setups, as reported in related literature, we focused on characterizing the behavior of the machine learning methods, as a preliminary step to obtain reproducible results across experimental setups, and, ultimately, good performance. We propose a methodology that integrates feature learning (autoencoders) and selection methods (genetic algorithms) thorough the exhaustive use of performance metrics (permutation tests and bootstrapping), which provide stronger statistical evidence to support investment decisions with the lab resources at hand. We show evidence for the usefulness of 1) the extensive use of computational resources, and 2) adopting a wider range of metrics than those reported in the literature to assess method performance. This approach allowed us to guide our quest for finding suitable machine learning methods, and to obtain results comparable to those in the literature with strong statistical stability.
publishDate	2016
dc.date.accessioned.none.fl_str_mv	2024-07-05T19:11:31Z
dc.date.available.none.fl_str_mv	2024-07-05T19:11:31Z
dc.date.none.fl_str_mv	2016-12-31
dc.type.none.fl_str_mv	Artículo de revista
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/article
dc.type.content.none.fl_str_mv	Text
dc.type.redcol.none.fl_str_mv	https://purl.org/redcol/resource_type/ART
dc.type.coarversion.spa.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a274
format	http://purl.org/coar/resource_type/c_2df8fbb1
dc.identifier.none.fl_str_mv	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834 10.19053/01211129.v26.n44.2017.5834
dc.identifier.uri.none.fl_str_mv	https://repositorio.uptc.edu.co/handle/001/14173
url	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834 https://repositorio.uptc.edu.co/handle/001/14173
identifier_str_mv	10.19053/01211129.v26.n44.2017.5834
dc.language.none.fl_str_mv	eng
dc.language.iso.spa.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/4728 https://revistas.uptc.edu.co/index.php/ingenieria/article/view/5834/6402
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf191
rights_invalid_str_mv	http://purl.org/coar/access_right/c_abf191 http://purl.org/coar/access_right/c_abf2
dc.format.none.fl_str_mv	application/pdf application/xml
dc.publisher.en-US.fl_str_mv	Universidad Pedagógica y Tecnológica de Colombia
dc.source.en-US.fl_str_mv	Revista Facultad de Ingeniería; Vol. 26 No. 44 (2017); 167-180
dc.source.es-ES.fl_str_mv	Revista Facultad de Ingeniería; Vol. 26 Núm. 44 (2017); 167-180
dc.source.none.fl_str_mv	2357-5328 0121-1129
institution	Universidad Pedagógica y Tecnológica de Colombia
repository.name.fl_str_mv	Repositorio Institucional UPTC
repository.mail.fl_str_mv	repositorio.uptc@uptc.edu.co
_version_	1849966064608215040

Assessing the behavior of machine learning methods to predict the activity of antimicrobial peptides

Publicaciones similares