Hierarchical multi-label classification methods for gene function prediction

This dissertation studies the problem of predicting gene functions from a computational approach. The goal of this problem is to predict associations between genes and functions, where genes can be associated to multiple biological functions and functions have a hierarchical organization. Four machi...

Full description

Autores:
Romero González , Miguel Ángel
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2022
Institución:
Pontificia Universidad Javeriana Cali
Repositorio:
Vitela
Idioma:
eng
OAI Identifier:
oai:vitela.javerianacali.edu.co:11522/2088
Acceso en línea:
https://vitela.javerianacali.edu.co/handle/11522/2088
Palabra clave:
Rights
License
https://creativecommons.org/licenses/by-nc-nd/4.0/
id Vitela2_583553f6cde6b4880434dc4709f20f5e
oai_identifier_str oai:vitela.javerianacali.edu.co:11522/2088
network_acronym_str Vitela2
network_name_str Vitela
repository_id_str
dc.title.eng.fl_str_mv Hierarchical multi-label classification methods for gene function prediction
title Hierarchical multi-label classification methods for gene function prediction
spellingShingle Hierarchical multi-label classification methods for gene function prediction
title_short Hierarchical multi-label classification methods for gene function prediction
title_full Hierarchical multi-label classification methods for gene function prediction
title_fullStr Hierarchical multi-label classification methods for gene function prediction
title_full_unstemmed Hierarchical multi-label classification methods for gene function prediction
title_sort Hierarchical multi-label classification methods for gene function prediction
dc.creator.fl_str_mv Romero González , Miguel Ángel
dc.contributor.advisor.none.fl_str_mv Rocha, Camilo
Finke, Jorge
dc.contributor.author.none.fl_str_mv Romero González , Miguel Ángel
description This dissertation studies the problem of predicting gene functions from a computational approach. The goal of this problem is to predict associations between genes and functions, where genes can be associated to multiple biological functions and functions have a hierarchical organization. Four machine learning methods are developed focusing on different aspects of the problem, which has been modeled as a classification task: (a) considering hierarchical relations between functions to produce consistent predictions; (b) creating new data representations to built predictive models; (c) exploiting paths of functions in the hierarchy to detect missing annotations of genes; and (d) integrating information available for multiple organisms into the classification task. The main contributions of this work include novel methods that (i) overcome the limitations of the combinatorial gene function prediction problem; (ii) can be used to effectively identify associations between genes and functions of different organisms, including those that do not have enough data available to train predictive models; and (iii) help to narrow down the search space for in vivo experiments. These methods have been tested in efforts to predict gene functions in rice and maize, but have been formulated more generally and are applicable to any multi-label classification problem where the classes are organized into a hierarchy.
publishDate 2022
dc.date.issued.none.fl_str_mv 2022
dc.date.accessioned.none.fl_str_mv 2024-06-09T15:32:31Z
dc.date.available.none.fl_str_mv 2024-06-09T15:32:31Z
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.local.none.fl_str_mv Tesis/Trabajo de grado - Monografía - Doctorado
dc.type.redcol.none.fl_str_mv https://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
dc.identifier.uri.none.fl_str_mv https://vitela.javerianacali.edu.co/handle/11522/2088
url https://vitela.javerianacali.edu.co/handle/11522/2088
dc.language.iso.none.fl_str_mv eng
language eng
dc.rights.uri.none.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.creativecommons.none.fl_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-nd/4.0/
http://purl.org/coar/access_right/c_abf2
dc.format.extent.none.fl_str_mv 132 p.
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Pontificia Universidad Javeriana Cali
publisher.none.fl_str_mv Pontificia Universidad Javeriana Cali
institution Pontificia Universidad Javeriana Cali
bitstream.url.fl_str_mv https://vitela.javerianacali.edu.co/bitstreams/27ec399e-9537-4a23-9ed2-2b5baac433ae/download
https://vitela.javerianacali.edu.co/bitstreams/9267409d-d6cd-46b0-b1dd-2ed121920932/download
https://vitela.javerianacali.edu.co/bitstreams/3d255232-9a85-4a83-b6bb-82cc31d32b2d/download
https://vitela.javerianacali.edu.co/bitstreams/3fb6f769-dcc6-4867-90a0-466804348292/download
https://vitela.javerianacali.edu.co/bitstreams/77db32e7-a47b-4e07-b5aa-f41811897d7f/download
https://vitela.javerianacali.edu.co/bitstreams/05cd92a5-f3cc-4e90-a435-5141407061e9/download
https://vitela.javerianacali.edu.co/bitstreams/fcd75aee-6ff1-427b-83a0-947ab8d40c9a/download
bitstream.checksum.fl_str_mv 8a4605be74aa9ea9d79846c1fba20a33
707bb2e571e005aa5748acf38c7f7a1c
9bee94053383c448f8d6491140dc70e3
c69bfeb6aa70ab9b10b52cba0e88d46e
d9e38ee46fb9c2ca6b0166a154f8a10b
5069dbf962fbfc09d7e4b1aeee07d6bc
3bef58f954a702760faa8b7b493c587e
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio Vitela
repository.mail.fl_str_mv vitela.mail@javerianacali.edu.co
_version_ 1812095057343283200
spelling Rocha, CamiloFinke, JorgeRomero González , Miguel Ángel2024-06-09T15:32:31Z2024-06-09T15:32:31Z2022https://vitela.javerianacali.edu.co/handle/11522/2088132 p.application/pdfengPontificia Universidad Javeriana Calihttps://creativecommons.org/licenses/by-nc-nd/4.0/https://creativecommons.org/licenses/by-nc-nd/4.0/http://purl.org/coar/access_right/c_abf2Hierarchical multi-label classification methods for gene function predictionhttp://purl.org/coar/resource_type/c_db06Tesis/Trabajo de grado - Monografía - Doctoradohttps://purl.org/redcol/resource_type/TDThis dissertation studies the problem of predicting gene functions from a computational approach. The goal of this problem is to predict associations between genes and functions, where genes can be associated to multiple biological functions and functions have a hierarchical organization. Four machine learning methods are developed focusing on different aspects of the problem, which has been modeled as a classification task: (a) considering hierarchical relations between functions to produce consistent predictions; (b) creating new data representations to built predictive models; (c) exploiting paths of functions in the hierarchy to detect missing annotations of genes; and (d) integrating information available for multiple organisms into the classification task. The main contributions of this work include novel methods that (i) overcome the limitations of the combinatorial gene function prediction problem; (ii) can be used to effectively identify associations between genes and functions of different organisms, including those that do not have enough data available to train predictive models; and (iii) help to narrow down the search space for in vivo experiments. These methods have been tested in efforts to predict gene functions in rice and maize, but have been formulated more generally and are applicable to any multi-label classification problem where the classes are organized into a hierarchy.Facultad de Ingeniería y Ciencias. Doctorado en Ingeniería y Ciencias AplicadasPontificia Universidad Javeriana CaliDoctoradoLICENSElicense.txtlicense.txttext/plain; charset=utf-81748https://vitela.javerianacali.edu.co/bitstreams/27ec399e-9537-4a23-9ed2-2b5baac433ae/download8a4605be74aa9ea9d79846c1fba20a33MD51ORIGINALMiguelRomero_Tesis.pdfMiguelRomero_Tesis.pdfapplication/pdf1802057https://vitela.javerianacali.edu.co/bitstreams/9267409d-d6cd-46b0-b1dd-2ed121920932/download707bb2e571e005aa5748acf38c7f7a1cMD52Licencia_autorizacion_biblioteca.docx(1).pdfLicencia_autorizacion_biblioteca.docx(1).pdfapplication/pdf119268https://vitela.javerianacali.edu.co/bitstreams/3d255232-9a85-4a83-b6bb-82cc31d32b2d/download9bee94053383c448f8d6491140dc70e3MD53TEXTMiguelRomero_Tesis.pdf.txtMiguelRomero_Tesis.pdf.txtExtracted texttext/plain100692https://vitela.javerianacali.edu.co/bitstreams/3fb6f769-dcc6-4867-90a0-466804348292/downloadc69bfeb6aa70ab9b10b52cba0e88d46eMD511Licencia_autorizacion_biblioteca.docx(1).pdf.txtLicencia_autorizacion_biblioteca.docx(1).pdf.txtExtracted texttext/plain4748https://vitela.javerianacali.edu.co/bitstreams/77db32e7-a47b-4e07-b5aa-f41811897d7f/downloadd9e38ee46fb9c2ca6b0166a154f8a10bMD513THUMBNAILMiguelRomero_Tesis.pdf.jpgMiguelRomero_Tesis.pdf.jpgGenerated Thumbnailimage/jpeg4041https://vitela.javerianacali.edu.co/bitstreams/05cd92a5-f3cc-4e90-a435-5141407061e9/download5069dbf962fbfc09d7e4b1aeee07d6bcMD512Licencia_autorizacion_biblioteca.docx(1).pdf.jpgLicencia_autorizacion_biblioteca.docx(1).pdf.jpgGenerated Thumbnailimage/jpeg5193https://vitela.javerianacali.edu.co/bitstreams/fcd75aee-6ff1-427b-83a0-947ab8d40c9a/download3bef58f954a702760faa8b7b493c587eMD51411522/2088oai:vitela.javerianacali.edu.co:11522/20882024-06-25 05:13:51.587https://creativecommons.org/licenses/by-nc-nd/4.0/open.accesshttps://vitela.javerianacali.edu.coRepositorio Vitelavitela.mail@javerianacali.edu.coTk9URTogUExBQ0UgWU9VUiBPV04gTElDRU5TRSBIRVJFClRoaXMgc2FtcGxlIGxpY2Vuc2UgaXMgcHJvdmlkZWQgZm9yIGluZm9ybWF0aW9uYWwgcHVycG9zZXMgb25seS4KCk5PTi1FWENMVVNJVkUgRElTVFJJQlVUSU9OIExJQ0VOU0UKCkJ5IHNpZ25pbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbnNlLCB5b3UgKHRoZSBhdXRob3Iocykgb3IgY29weXJpZ2h0Cm93bmVyKSBncmFudHMgdG8gRFNwYWNlIFVuaXZlcnNpdHkgKERTVSkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLAp0cmFuc2xhdGUgKGFzIGRlZmluZWQgYmVsb3cpLCBhbmQvb3IgZGlzdHJpYnV0ZSB5b3VyIHN1Ym1pc3Npb24gKGluY2x1ZGluZwp0aGUgYWJzdHJhY3QpIHdvcmxkd2lkZSBpbiBwcmludCBhbmQgZWxlY3Ryb25pYyBmb3JtYXQgYW5kIGluIGFueSBtZWRpdW0sCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBEU1UgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4KCllvdSBhbHNvIGFncmVlIHRoYXQgRFNVIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIHN1Ym1pc3Npb24gZm9yCnB1cnBvc2VzIG9mIHNlY3VyaXR5LCBiYWNrLXVwIGFuZCBwcmVzZXJ2YXRpb24uCgpZb3UgcmVwcmVzZW50IHRoYXQgdGhlIHN1Ym1pc3Npb24gaXMgeW91ciBvcmlnaW5hbCB3b3JrLCBhbmQgdGhhdCB5b3UgaGF2ZQp0aGUgcmlnaHQgdG8gZ3JhbnQgdGhlIHJpZ2h0cyBjb250YWluZWQgaW4gdGhpcyBsaWNlbnNlLiBZb3UgYWxzbyByZXByZXNlbnQKdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIga25vd2xlZGdlLCBpbmZyaW5nZSB1cG9uCmFueW9uZSdzIGNvcHlyaWdodC4KCklmIHRoZSBzdWJtaXNzaW9uIGNvbnRhaW5zIG1hdGVyaWFsIGZvciB3aGljaCB5b3UgZG8gbm90IGhvbGQgY29weXJpZ2h0LAp5b3UgcmVwcmVzZW50IHRoYXQgeW91IGhhdmUgb2J0YWluZWQgdGhlIHVucmVzdHJpY3RlZCBwZXJtaXNzaW9uIG9mIHRoZQpjb3B5cmlnaHQgb3duZXIgdG8gZ3JhbnQgRFNVIHRoZSByaWdodHMgcmVxdWlyZWQgYnkgdGhpcyBsaWNlbnNlLCBhbmQgdGhhdApzdWNoIHRoaXJkLXBhcnR5IG93bmVkIG1hdGVyaWFsIGlzIGNsZWFybHkgaWRlbnRpZmllZCBhbmQgYWNrbm93bGVkZ2VkCndpdGhpbiB0aGUgdGV4dCBvciBjb250ZW50IG9mIHRoZSBzdWJtaXNzaW9uLgoKSUYgVEhFIFNVQk1JU1NJT04gSVMgQkFTRUQgVVBPTiBXT1JLIFRIQVQgSEFTIEJFRU4gU1BPTlNPUkVEIE9SIFNVUFBPUlRFRApCWSBBTiBBR0VOQ1kgT1IgT1JHQU5JWkFUSU9OIE9USEVSIFRIQU4gRFNVLCBZT1UgUkVQUkVTRU5UIFRIQVQgWU9VIEhBVkUKRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgUkVRVUlSRUQgQlkgU1VDSApDT05UUkFDVCBPUiBBR1JFRU1FTlQuCgpEU1Ugd2lsbCBjbGVhcmx5IGlkZW50aWZ5IHlvdXIgbmFtZShzKSBhcyB0aGUgYXV0aG9yKHMpIG9yIG93bmVyKHMpIG9mIHRoZQpzdWJtaXNzaW9uLCBhbmQgd2lsbCBub3QgbWFrZSBhbnkgYWx0ZXJhdGlvbiwgb3RoZXIgdGhhbiBhcyBhbGxvd2VkIGJ5IHRoaXMKbGljZW5zZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=