Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the mo...
- Autores:
-
Aceros Cardozo, Sara Lucía
Salazar Beltrán, María Elena
- Tipo de recurso:
- Fecha de publicación:
- 2021
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/51008
- Acceso en línea:
- http://hdl.handle.net/1992/51008
- Palabra clave:
- Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
Ingeniería
- Rights
- openAccess
- License
- https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
id |
UNIANDES2_f7cb17afac3ef838188123d7b65a5f98 |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/51008 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
title |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
spellingShingle |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma Aprendizaje automático (Inteligencia artificial) Enfermedades de los pulmones Adenocarcinoma Pulmones Síndrome de dificultad respiratoria Funciones de Kernel Ingeniería |
title_short |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
title_full |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
title_fullStr |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
title_full_unstemmed |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
title_sort |
Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma |
dc.creator.fl_str_mv |
Aceros Cardozo, Sara Lucía Salazar Beltrán, María Elena |
dc.contributor.advisor.none.fl_str_mv |
Valencia Arboleda, Carlos Felipe |
dc.contributor.author.none.fl_str_mv |
Aceros Cardozo, Sara Lucía Salazar Beltrán, María Elena |
dc.contributor.jury.none.fl_str_mv |
Castillo Hernández, Mario Cristancho Ardila, Marco Aurelio |
dc.subject.armarc.none.fl_str_mv |
Aprendizaje automático (Inteligencia artificial) Enfermedades de los pulmones Adenocarcinoma Pulmones Síndrome de dificultad respiratoria Funciones de Kernel |
topic |
Aprendizaje automático (Inteligencia artificial) Enfermedades de los pulmones Adenocarcinoma Pulmones Síndrome de dificultad respiratoria Funciones de Kernel Ingeniería |
dc.subject.themes.none.fl_str_mv |
Ingeniería |
description |
Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the most widely used and promising techniques, are extensions of the Non-negative Matrix Factorization (NMF) method. However, none of the NMF extensions have simultaneously addressed the integration of multiple inputs coming from different sources and the nonlinear relationships inherent in biological process. In this paper, we propose a kernel-based NMF approach that aims to integrate multiple inputs coming from two different sources, including previous knowledge and nonlinear relationships. The proposed kernelized technique and the non-kernelized one, were implemented and tested with lung adenocarcinoma (LUAD) information of three different omic profiles coming from an experimental and an observational data source. The performance of the methods was evaluated and contrasted using cophenetic coefficient, AUC and a biological score. The results show that kernelized technique greatly overcome the performance of the standard one, in all metrics. The proposed method enables to identify molecule co-modules that were enriched in pathways tightly related to lung cancer emergence and progression. Also, analysis of enriched co-modules and their relevant pathways enable to identify genes and genes regulators with a key role in lung tumorigenesis and propose them as potential biomarkers. |
publishDate |
2021 |
dc.date.accessioned.none.fl_str_mv |
2021-08-10T18:06:05Z |
dc.date.available.none.fl_str_mv |
2021-08-10T18:06:05Z |
dc.date.issued.none.fl_str_mv |
2021 |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/1992/51008 |
dc.identifier.pdf.none.fl_str_mv |
22787.pdf |
dc.identifier.instname.spa.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.spa.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.spa.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
http://hdl.handle.net/1992/51008 |
identifier_str_mv |
22787.pdf instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.rights.uri.*.fl_str_mv |
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.none.fl_str_mv |
27 hojas |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.none.fl_str_mv |
Maestría en Ingeniería Industrial |
dc.publisher.faculty.none.fl_str_mv |
Facultad de Ingeniería |
dc.publisher.department.none.fl_str_mv |
Departamento de Ingeniería Industrial |
publisher.none.fl_str_mv |
Universidad de los Andes |
institution |
Universidad de los Andes |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/8cf4d458-4d7f-4203-b70c-d080de0804c4/download https://repositorio.uniandes.edu.co/bitstreams/5a4dc803-acf1-43c3-afad-516111ed2c11/download https://repositorio.uniandes.edu.co/bitstreams/65068560-0cc8-4e7d-b7db-303e1b5fa135/download |
bitstream.checksum.fl_str_mv |
a6a3d21b3785c9b737806c2b688e83f3 c3dc09fc5ee509e96a8fdb48563d2740 e2756b9a8cfaaf5815709d0bd7df8015 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1812133842329272320 |
spelling |
Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Valencia Arboleda, Carlos Felipevirtual::3033-1Aceros Cardozo, Sara Lucíabd228284-3b98-4898-a877-1d6e61931bb7500Salazar Beltrán, María Elenad1341058-5cff-45f8-810f-77bca1a552fb500Castillo Hernández, MarioCristancho Ardila, Marco Aurelio2021-08-10T18:06:05Z2021-08-10T18:06:05Z2021http://hdl.handle.net/1992/5100822787.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the most widely used and promising techniques, are extensions of the Non-negative Matrix Factorization (NMF) method. However, none of the NMF extensions have simultaneously addressed the integration of multiple inputs coming from different sources and the nonlinear relationships inherent in biological process. In this paper, we propose a kernel-based NMF approach that aims to integrate multiple inputs coming from two different sources, including previous knowledge and nonlinear relationships. The proposed kernelized technique and the non-kernelized one, were implemented and tested with lung adenocarcinoma (LUAD) information of three different omic profiles coming from an experimental and an observational data source. The performance of the methods was evaluated and contrasted using cophenetic coefficient, AUC and a biological score. The results show that kernelized technique greatly overcome the performance of the standard one, in all metrics. The proposed method enables to identify molecule co-modules that were enriched in pathways tightly related to lung cancer emergence and progression. Also, analysis of enriched co-modules and their relevant pathways enable to identify genes and genes regulators with a key role in lung tumorigenesis and propose them as potential biomarkers.La integración de datos multiómicos es un tema de gran interés ya que permite analizar una gran cantidad de datos biológicos y contribuir a la comprensión de los procesos biológicos subyacentes en los organismos. Se han propuesto múltiples técnicas de aprendizaje automático para integrar datos biológicos. Algunas de las técnicas más utilizadas y prometedoras son las extensiones del método Non-negative Matrix Factorization (NMF). Sin embargo, ninguna de las extensiones de NMF ha abordado simultáneamente la integración de múltiples entradas provenientes de diferentes fuentes y las relaciones no lineales inherentes a procesos biológicos. En este artículo, proponemos un enfoque NMF basado en kernels que tiene como objetivo integrar múltiples entradas provenientes de dos fuentes diferentes, incluido el conocimiento previo y las relaciones no lineales. La técnica kernelizada propuesta y la no kernelizada fueron implementadas y probadas con información de adenocarcinoma de pulmón (LUAD) de tres perfiles ómicos diferentes provenientes de una fuente de datos experimental y observacional. El desempeño de los métodos fue evaluado y contrastado usando coeficiente cofenético, AUC y puntaje biológico. Los resultados muestran que la técnica kernelizada supera en gran medida el rendimiento de la estándar, en todas las métricas. El método propuesto permite identificar co-módulos de moléculas enriquecidos en vías estrechamente relacionadas con la aparición y progresión del cáncer de pulmón. Asimismo, el análisis de co-módulos enriquecidos y sus rutas relevantes permite identificar genes y reguladores de genes con un papel clave en la tumorigénesis pulmonar y proponerlos como posibles biomarcadores.Magíster en Ingeniería IndustrialMaestría27 hojasapplication/pdfengUniversidad de los AndesMaestría en Ingeniería IndustrialFacultad de IngenieríaDepartamento de Ingeniería IndustrialMulti-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinomaTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesishttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TMAprendizaje automático (Inteligencia artificial)Enfermedades de los pulmonesAdenocarcinomaPulmonesSíndrome de dificultad respiratoriaFunciones de KernelIngeniería201415277Publicationhttps://scholar.google.es/citations?user=vPH5LywAAAAJvirtual::3033-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000271403virtual::3033-1e1de19e8-629e-401d-a9d3-77eea3d2db48virtual::3033-1e1de19e8-629e-401d-a9d3-77eea3d2db48virtual::3033-1ORIGINAL22787.pdfapplication/pdf2196370https://repositorio.uniandes.edu.co/bitstreams/8cf4d458-4d7f-4203-b70c-d080de0804c4/downloada6a3d21b3785c9b737806c2b688e83f3MD51TEXT22787.pdf.txt22787.pdf.txtExtracted texttext/plain98109https://repositorio.uniandes.edu.co/bitstreams/5a4dc803-acf1-43c3-afad-516111ed2c11/downloadc3dc09fc5ee509e96a8fdb48563d2740MD54THUMBNAIL22787.pdf.jpg22787.pdf.jpgIM Thumbnailimage/jpeg26187https://repositorio.uniandes.edu.co/bitstreams/65068560-0cc8-4e7d-b7db-303e1b5fa135/downloade2756b9a8cfaaf5815709d0bd7df8015MD551992/51008oai:repositorio.uniandes.edu.co:1992/510082024-03-13 12:20:25.546https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co |