Multi-omic data integration using joint non-negative matrix and machine learning methods for clinical endpoints prediction and causal parameter estimation in cancer

Currently, several data sources drive the understanding of biological or clinical processes. Although their purpose is to assist in optimal decision-making, they require strategies that facilitate these data sources¿ integration. For example, in biological sciences, multi-omic data integration has i...

Full description

Autores:
Salazar Barreto, Diego Armando
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2022
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/59247
Acceso en línea:
http://hdl.handle.net/1992/59247
Palabra clave:
Multi-omic integration
Kernel trick
Causal inference
Targeted Learning
Machine Learning
Glioma
Breast cancer
Lung adenocarcinoma
Drug repurposing
Precision medicine
co-clustering
Joint Non-negative Matrix Factorization
Superlearner
data fusion
Ingeniería
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Currently, several data sources drive the understanding of biological or clinical processes. Although their purpose is to assist in optimal decision-making, they require strategies that facilitate these data sources¿ integration. For example, in biological sciences, multi-omic data integration has improved the characterization of multiple types of cancers, which guarantees a better diagnosis and treatment. Therefore, integrating data can identify new drug targets and biomarkers, predict phenotypes or improve the design of observational clinical studies. This project aimed to contribute to the state of the art of multi-omics data integration methodologies by coupling various biological data sources (omic data and prior knowledge) using different machine learning algorithms. Our first contribution was to construct a strategy to integrate data sources from two cancer projects. We called this Multi-project and Multi-profile joint Non-negative Matrix Factorization (M&M-jNMF), which has clustering and predicting properties. Second, we applied a non-linear solution using kernels to the jNMF algorithm, which resulted in a more proper biological representation. Third, we proposed the M&M-jNMF based on kernels to improve the properties of this method. Finally, our last goal was to incorporate different multi-omic integration strategies into the Targeted Learning methodology to improve causal estimation and generate new advances in observational studies.