Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the mo...
- Autores:
-
Aceros Cardozo, Sara Lucía
Salazar Beltrán, María Elena
- Tipo de recurso:
- Fecha de publicación:
- 2021
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/51008
- Acceso en línea:
- http://hdl.handle.net/1992/51008
- Palabra clave:
- Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
Ingeniería
- Rights
- openAccess
- License
- https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
Summary: | Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the most widely used and promising techniques, are extensions of the Non-negative Matrix Factorization (NMF) method. However, none of the NMF extensions have simultaneously addressed the integration of multiple inputs coming from different sources and the nonlinear relationships inherent in biological process. In this paper, we propose a kernel-based NMF approach that aims to integrate multiple inputs coming from two different sources, including previous knowledge and nonlinear relationships. The proposed kernelized technique and the non-kernelized one, were implemented and tested with lung adenocarcinoma (LUAD) information of three different omic profiles coming from an experimental and an observational data source. The performance of the methods was evaluated and contrasted using cophenetic coefficient, AUC and a biological score. The results show that kernelized technique greatly overcome the performance of the standard one, in all metrics. The proposed method enables to identify molecule co-modules that were enriched in pathways tightly related to lung cancer emergence and progression. Also, analysis of enriched co-modules and their relevant pathways enable to identify genes and genes regulators with a key role in lung tumorigenesis and propose them as potential biomarkers. |
---|