Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer

Currently, several data sources drive the understanding of biological or clinical processes. Although their purpose is to assist in optimal decision-making, they require strategies that facilitate these data sources¿ integration. For example, in biological sciences, multi-omic data integration has i...

Full description

Autores:
Salazar Barreto, Diego Armando
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2022
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/59247
Acceso en línea:
http://hdl.handle.net/1992/59247
Palabra clave:
Multi-omic integration
Kernel trick
Causal inference
Targeted Learning
Machine Learning
Glioma
Breast cancer
Lung adenocarcinoma
Drug repurposing
Precision medicine
co-clustering
Joint Non-negative Matrix Factorization
Superlearner
data fusion
Ingeniería
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNIANDES2_8367dfe26a6a4d74d2b4e1e47a2172e9
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/59247
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.none.fl_str_mv Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
title Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
spellingShingle Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
Multi-omic integration
Kernel trick
Causal inference
Targeted Learning
Machine Learning
Glioma
Breast cancer
Lung adenocarcinoma
Drug repurposing
Precision medicine
co-clustering
Joint Non-negative Matrix Factorization
Superlearner
data fusion
Ingeniería
title_short Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
title_full Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
title_fullStr Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
title_full_unstemmed Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
title_sort Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer
dc.creator.fl_str_mv Salazar Barreto, Diego Armando
dc.contributor.advisor.none.fl_str_mv Valencia Arboleda, Carlos Felipe
Díaz Muñoz, Iván Leonardo
dc.contributor.author.none.fl_str_mv Salazar Barreto, Diego Armando
dc.contributor.jury.none.fl_str_mv Duitama Castellanos, Jorge Alexander
Przulj, Natasa
Vallejo Ardila, Dora Lucía
Flórez Vargas, Oscar
dc.contributor.researchgroup.es_CO.fl_str_mv Centro para la Optimización y la Probabilidad Aplicada
dc.subject.keyword.none.fl_str_mv Multi-omic integration
Kernel trick
Causal inference
Targeted Learning
Machine Learning
Glioma
Breast cancer
Lung adenocarcinoma
Drug repurposing
Precision medicine
co-clustering
Joint Non-negative Matrix Factorization
Superlearner
data fusion
topic Multi-omic integration
Kernel trick
Causal inference
Targeted Learning
Machine Learning
Glioma
Breast cancer
Lung adenocarcinoma
Drug repurposing
Precision medicine
co-clustering
Joint Non-negative Matrix Factorization
Superlearner
data fusion
Ingeniería
dc.subject.themes.es_CO.fl_str_mv Ingeniería
description Currently, several data sources drive the understanding of biological or clinical processes. Although their purpose is to assist in optimal decision-making, they require strategies that facilitate these data sources¿ integration. For example, in biological sciences, multi-omic data integration has improved the characterization of multiple types of cancers, which guarantees a better diagnosis and treatment. Therefore, integrating data can identify new drug targets and biomarkers, predict phenotypes or improve the design of observational clinical studies. This project aimed to contribute to the state of the art of multi-omics data integration methodologies by coupling various biological data sources (omic data and prior knowledge) using different machine learning algorithms. Our first contribution was to construct a strategy to integrate data sources from two cancer projects. We called this Multi-project and Multi-profile joint Non-negative Matrix Factorization (M&M-jNMF), which has clustering and predicting properties. Second, we applied a non-linear solution using kernels to the jNMF algorithm, which resulted in a more proper biological representation. Third, we proposed the M&M-jNMF based on kernels to improve the properties of this method. Finally, our last goal was to incorporate different multi-omic integration strategies into the Targeted Learning methodology to improve causal estimation and generate new advances in observational studies.
publishDate 2022
dc.date.accessioned.none.fl_str_mv 2022-07-27T21:55:58Z
dc.date.available.none.fl_str_mv 2022-07-27T21:55:58Z
dc.date.issued.none.fl_str_mv 2022-06-30
dc.type.es_CO.fl_str_mv Trabajo de grado - Doctorado
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.version.none.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.content.es_CO.fl_str_mv Text
dc.type.redcol.none.fl_str_mv https://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/1992/59247
dc.identifier.doi.none.fl_str_mv 10.57784/1992/59247
dc.identifier.instname.es_CO.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.es_CO.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.es_CO.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url http://hdl.handle.net/1992/59247
identifier_str_mv 10.57784/1992/59247
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.es_CO.fl_str_mv eng
language eng
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.*.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.es_CO.fl_str_mv 129 paginas
dc.format.mimetype.es_CO.fl_str_mv application/pdf
dc.publisher.es_CO.fl_str_mv Universidad de los Andes
dc.publisher.program.es_CO.fl_str_mv Doctorado en Ingeniería
dc.publisher.faculty.es_CO.fl_str_mv Facultad de Ingeniería
dc.publisher.department.es_CO.fl_str_mv Departamento de Ingeniería Industrial
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/e55145ad-5bad-45fd-81a5-1303ce91e44c/download
https://repositorio.uniandes.edu.co/bitstreams/fb3746a7-afdc-4fc5-8949-c24dfee3ced8/download
https://repositorio.uniandes.edu.co/bitstreams/3c4867bd-aeec-422d-9b21-6c47168a781d/download
https://repositorio.uniandes.edu.co/bitstreams/da809cfb-7c92-4a59-8779-80bec36a35c3/download
https://repositorio.uniandes.edu.co/bitstreams/048755f2-2169-4d0a-af16-ac5717ae02b3/download
https://repositorio.uniandes.edu.co/bitstreams/a9e01172-1a20-469c-82cc-a9994c89c66a/download
https://repositorio.uniandes.edu.co/bitstreams/821d9553-5626-4db3-9fa7-ee03efc1cfde/download
https://repositorio.uniandes.edu.co/bitstreams/2f500c7f-c19a-4d1d-b89a-f1ac8fac283b/download
bitstream.checksum.fl_str_mv 5aa5c691a1ffe97abd12c2966efcb8d6
a2f838efe1c9aae7a9dbe3378f48b26f
f3eec2967285815caf7a5ce6e84c60ad
24013099e9e6abb1575dc6ce0855efd5
d80e45a330b9027fe495c5c6a4a6c9e0
4491fe1afb58beaaef41a73cf7ff2e27
b2c420a011f4b64ed2095e767b7ede98
9dc2d3e9d1529269d9060c4a21e34da5
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1812133921106690048
spelling Atribución-NoComercial 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Valencia Arboleda, Carlos Felipe7fd3265a-f28e-4682-941f-394ec6e3634d600Díaz Muñoz, Iván Leonardof7d10ed0-d8a3-4e17-b4fb-047ae435d644600Salazar Barreto, Diego Armando22a2c472-2515-4b12-b918-d19eb4c8e5b1600Duitama Castellanos, Jorge AlexanderPrzulj, NatasaVallejo Ardila, Dora LucíaFlórez Vargas, OscarCentro para la Optimización y la Probabilidad Aplicada2022-07-27T21:55:58Z2022-07-27T21:55:58Z2022-06-30http://hdl.handle.net/1992/5924710.57784/1992/59247instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Currently, several data sources drive the understanding of biological or clinical processes. Although their purpose is to assist in optimal decision-making, they require strategies that facilitate these data sources¿ integration. For example, in biological sciences, multi-omic data integration has improved the characterization of multiple types of cancers, which guarantees a better diagnosis and treatment. Therefore, integrating data can identify new drug targets and biomarkers, predict phenotypes or improve the design of observational clinical studies. This project aimed to contribute to the state of the art of multi-omics data integration methodologies by coupling various biological data sources (omic data and prior knowledge) using different machine learning algorithms. Our first contribution was to construct a strategy to integrate data sources from two cancer projects. We called this Multi-project and Multi-profile joint Non-negative Matrix Factorization (M&M-jNMF), which has clustering and predicting properties. Second, we applied a non-linear solution using kernels to the jNMF algorithm, which resulted in a more proper biological representation. Third, we proposed the M&M-jNMF based on kernels to improve the properties of this method. Finally, our last goal was to incorporate different multi-omic integration strategies into the Targeted Learning methodology to improve causal estimation and generate new advances in observational studies.COLCIENCIAS convocatoria No. 785Doctor en IngenieríaDoctoradoHealth systems129 paginasapplication/pdfengUniversidad de los AndesDoctorado en IngenieríaFacultad de IngenieríaDepartamento de Ingeniería IndustrialMulti-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancerTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttps://purl.org/redcol/resource_type/TDMulti-omic integrationKernel trickCausal inferenceTargeted LearningMachine LearningGliomaBreast cancerLung adenocarcinomaDrug repurposingPrecision medicineco-clusteringJoint Non-negative Matrix FactorizationSuperlearnerdata fusionIngeniería201628925PublicationLICENSElicense.txtlicense.txttext/plain; charset=utf-81810https://repositorio.uniandes.edu.co/bitstreams/e55145ad-5bad-45fd-81a5-1303ce91e44c/download5aa5c691a1ffe97abd12c2966efcb8d6MD51ORIGINALPhD_Thesis.pdfPhD_Thesis.pdfTesis de doctoradoapplication/pdf3203161https://repositorio.uniandes.edu.co/bitstreams/fb3746a7-afdc-4fc5-8949-c24dfee3ced8/downloada2f838efe1c9aae7a9dbe3378f48b26fMD53Repositorio_Tesis.pdfRepositorio_Tesis.pdfHIDEapplication/pdf185856https://repositorio.uniandes.edu.co/bitstreams/3c4867bd-aeec-422d-9b21-6c47168a781d/downloadf3eec2967285815caf7a5ce6e84c60adMD54CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8914https://repositorio.uniandes.edu.co/bitstreams/da809cfb-7c92-4a59-8779-80bec36a35c3/download24013099e9e6abb1575dc6ce0855efd5MD52TEXTPhD_Thesis.pdf.txtPhD_Thesis.pdf.txtExtracted texttext/plain233469https://repositorio.uniandes.edu.co/bitstreams/048755f2-2169-4d0a-af16-ac5717ae02b3/downloadd80e45a330b9027fe495c5c6a4a6c9e0MD55Repositorio_Tesis.pdf.txtRepositorio_Tesis.pdf.txtExtracted texttext/plain1163https://repositorio.uniandes.edu.co/bitstreams/a9e01172-1a20-469c-82cc-a9994c89c66a/download4491fe1afb58beaaef41a73cf7ff2e27MD57THUMBNAILPhD_Thesis.pdf.jpgPhD_Thesis.pdf.jpgIM Thumbnailimage/jpeg6809https://repositorio.uniandes.edu.co/bitstreams/821d9553-5626-4db3-9fa7-ee03efc1cfde/downloadb2c420a011f4b64ed2095e767b7ede98MD56Repositorio_Tesis.pdf.jpgRepositorio_Tesis.pdf.jpgIM Thumbnailimage/jpeg17305https://repositorio.uniandes.edu.co/bitstreams/2f500c7f-c19a-4d1d-b89a-f1ac8fac283b/download9dc2d3e9d1529269d9060c4a21e34da5MD581992/59247oai:repositorio.uniandes.edu.co:1992/592472024-08-26 15:22:56.127http://creativecommons.org/licenses/by-nc/4.0/open.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coWW8sIGVuIG1pIGNhbGlkYWQgZGUgYXV0b3IgZGVsIHRyYWJham8gZGUgdGVzaXMsIG1vbm9ncmFmw61hIG8gdHJhYmFqbyBkZSBncmFkbywgaGFnbyBlbnRyZWdhIGRlbCBlamVtcGxhciByZXNwZWN0aXZvIHkgZGUgc3VzIGFuZXhvcyBkZSBzZXIgZWwgY2FzbywgZW4gZm9ybWF0byBkaWdpdGFsIHkvbyBlbGVjdHLDs25pY28geSBhdXRvcml6byBhIGxhIFVuaXZlcnNpZGFkIGRlIGxvcyBBbmRlcyBwYXJhIHF1ZSByZWFsaWNlIGxhIHB1YmxpY2FjacOzbiBlbiBlbCBTaXN0ZW1hIGRlIEJpYmxpb3RlY2FzIG8gZW4gY3VhbHF1aWVyIG90cm8gc2lzdGVtYSBvIGJhc2UgZGUgZGF0b3MgcHJvcGlvIG8gYWplbm8gYSBsYSBVbml2ZXJzaWRhZCB5IHBhcmEgcXVlIGVuIGxvcyB0w6lybWlub3MgZXN0YWJsZWNpZG9zIGVuIGxhIExleSAyMyBkZSAxOTgyLCBMZXkgNDQgZGUgMTk5MywgRGVjaXNpw7NuIEFuZGluYSAzNTEgZGUgMTk5MywgRGVjcmV0byA0NjAgZGUgMTk5NSB5IGRlbcOhcyBub3JtYXMgZ2VuZXJhbGVzIHNvYnJlIGxhIG1hdGVyaWEsIHV0aWxpY2UgZW4gdG9kYXMgc3VzIGZvcm1hcywgbG9zIGRlcmVjaG9zIHBhdHJpbW9uaWFsZXMgZGUgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgdHJhbnNmb3JtYWNpw7NuIHkgZGlzdHJpYnVjacOzbiAoYWxxdWlsZXIsIHByw6lzdGFtbyBww7pibGljbyBlIGltcG9ydGFjacOzbikgcXVlIG1lIGNvcnJlc3BvbmRlbiBjb21vIGNyZWFkb3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50by4gIAoKCkxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgZW1pdGUgZW4gY2FsaWRhZCBkZSBhdXRvciBkZSBsYSBvYnJhIG9iamV0byBkZWwgcHJlc2VudGUgZG9jdW1lbnRvIHkgbm8gY29ycmVzcG9uZGUgYSBjZXNpw7NuIGRlIGRlcmVjaG9zLCBzaW5vIGEgbGEgYXV0b3JpemFjacOzbiBkZSB1c28gYWNhZMOpbWljbyBkZSBjb25mb3JtaWRhZCBjb24gbG8gYW50ZXJpb3JtZW50ZSBzZcOxYWxhZG8uIExhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gc2UgaGFjZSBleHRlbnNpdmEgbm8gc29sbyBhIGxhcyBmYWN1bHRhZGVzIHkgZGVyZWNob3MgZGUgdXNvIHNvYnJlIGxhIG9icmEgZW4gZm9ybWF0byBvIHNvcG9ydGUgbWF0ZXJpYWwsIHNpbm8gdGFtYmnDqW4gcGFyYSBmb3JtYXRvIGVsZWN0csOzbmljbywgeSBlbiBnZW5lcmFsIHBhcmEgY3VhbHF1aWVyIGZvcm1hdG8gY29ub2NpZG8gbyBwb3IgY29ub2Nlci4gCgoKRWwgYXV0b3IsIG1hbmlmaWVzdGEgcXVlIGxhIG9icmEgb2JqZXRvIGRlIGxhIHByZXNlbnRlIGF1dG9yaXphY2nDs24gZXMgb3JpZ2luYWwgeSBsYSByZWFsaXrDsyBzaW4gdmlvbGFyIG8gdXN1cnBhciBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcywgcG9yIGxvIHRhbnRvLCBsYSBvYnJhIGVzIGRlIHN1IGV4Y2x1c2l2YSBhdXRvcsOtYSB5IHRpZW5lIGxhIHRpdHVsYXJpZGFkIHNvYnJlIGxhIG1pc21hLiAKCgpFbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLiAKCg==