Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma

Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the mo...

Full description

Autores:
Aceros Cardozo, Sara Lucía
Salazar Beltrán, María Elena
Tipo de recurso:
Fecha de publicación:
2021
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/51008
Acceso en línea:
http://hdl.handle.net/1992/51008
Palabra clave:
Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
Ingeniería
Rights
openAccess
License
https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
id UNIANDES2_f7cb17afac3ef838188123d7b65a5f98
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/51008
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.spa.fl_str_mv Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
title Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
spellingShingle Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
Ingeniería
title_short Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
title_full Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
title_fullStr Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
title_full_unstemmed Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
title_sort Multi-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinoma
dc.creator.fl_str_mv Aceros Cardozo, Sara Lucía
Salazar Beltrán, María Elena
dc.contributor.advisor.none.fl_str_mv Valencia Arboleda, Carlos Felipe
dc.contributor.author.none.fl_str_mv Aceros Cardozo, Sara Lucía
Salazar Beltrán, María Elena
dc.contributor.jury.none.fl_str_mv Castillo Hernández, Mario
Cristancho Ardila, Marco Aurelio
dc.subject.armarc.none.fl_str_mv Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
topic Aprendizaje automático (Inteligencia artificial)
Enfermedades de los pulmones
Adenocarcinoma
Pulmones
Síndrome de dificultad respiratoria
Funciones de Kernel
Ingeniería
dc.subject.themes.none.fl_str_mv Ingeniería
description Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the most widely used and promising techniques, are extensions of the Non-negative Matrix Factorization (NMF) method. However, none of the NMF extensions have simultaneously addressed the integration of multiple inputs coming from different sources and the nonlinear relationships inherent in biological process. In this paper, we propose a kernel-based NMF approach that aims to integrate multiple inputs coming from two different sources, including previous knowledge and nonlinear relationships. The proposed kernelized technique and the non-kernelized one, were implemented and tested with lung adenocarcinoma (LUAD) information of three different omic profiles coming from an experimental and an observational data source. The performance of the methods was evaluated and contrasted using cophenetic coefficient, AUC and a biological score. The results show that kernelized technique greatly overcome the performance of the standard one, in all metrics. The proposed method enables to identify molecule co-modules that were enriched in pathways tightly related to lung cancer emergence and progression. Also, analysis of enriched co-modules and their relevant pathways enable to identify genes and genes regulators with a key role in lung tumorigenesis and propose them as potential biomarkers.
publishDate 2021
dc.date.accessioned.none.fl_str_mv 2021-08-10T18:06:05Z
dc.date.available.none.fl_str_mv 2021-08-10T18:06:05Z
dc.date.issued.none.fl_str_mv 2021
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/1992/51008
dc.identifier.pdf.none.fl_str_mv 22787.pdf
dc.identifier.instname.spa.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.spa.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.spa.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url http://hdl.handle.net/1992/51008
identifier_str_mv 22787.pdf
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.rights.uri.*.fl_str_mv https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.none.fl_str_mv 27 hojas
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de los Andes
dc.publisher.program.none.fl_str_mv Maestría en Ingeniería Industrial
dc.publisher.faculty.none.fl_str_mv Facultad de Ingeniería
dc.publisher.department.none.fl_str_mv Departamento de Ingeniería Industrial
publisher.none.fl_str_mv Universidad de los Andes
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/8cf4d458-4d7f-4203-b70c-d080de0804c4/download
https://repositorio.uniandes.edu.co/bitstreams/5a4dc803-acf1-43c3-afad-516111ed2c11/download
https://repositorio.uniandes.edu.co/bitstreams/65068560-0cc8-4e7d-b7db-303e1b5fa135/download
bitstream.checksum.fl_str_mv a6a3d21b3785c9b737806c2b688e83f3
c3dc09fc5ee509e96a8fdb48563d2740
e2756b9a8cfaaf5815709d0bd7df8015
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1812133842329272320
spelling Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfinfo:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Valencia Arboleda, Carlos Felipevirtual::3033-1Aceros Cardozo, Sara Lucíabd228284-3b98-4898-a877-1d6e61931bb7500Salazar Beltrán, María Elenad1341058-5cff-45f8-810f-77bca1a552fb500Castillo Hernández, MarioCristancho Ardila, Marco Aurelio2021-08-10T18:06:05Z2021-08-10T18:06:05Z2021http://hdl.handle.net/1992/5100822787.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Multi-omic data integration is a topic of great interest as it enables to analyze vast amount of biological data and contribute to the understanding of the biological processes underlying in organisms. Multiple machine learning techniques have been proposed to ingrate biological data. Some of the most widely used and promising techniques, are extensions of the Non-negative Matrix Factorization (NMF) method. However, none of the NMF extensions have simultaneously addressed the integration of multiple inputs coming from different sources and the nonlinear relationships inherent in biological process. In this paper, we propose a kernel-based NMF approach that aims to integrate multiple inputs coming from two different sources, including previous knowledge and nonlinear relationships. The proposed kernelized technique and the non-kernelized one, were implemented and tested with lung adenocarcinoma (LUAD) information of three different omic profiles coming from an experimental and an observational data source. The performance of the methods was evaluated and contrasted using cophenetic coefficient, AUC and a biological score. The results show that kernelized technique greatly overcome the performance of the standard one, in all metrics. The proposed method enables to identify molecule co-modules that were enriched in pathways tightly related to lung cancer emergence and progression. Also, analysis of enriched co-modules and their relevant pathways enable to identify genes and genes regulators with a key role in lung tumorigenesis and propose them as potential biomarkers.La integración de datos multiómicos es un tema de gran interés ya que permite analizar una gran cantidad de datos biológicos y contribuir a la comprensión de los procesos biológicos subyacentes en los organismos. Se han propuesto múltiples técnicas de aprendizaje automático para integrar datos biológicos. Algunas de las técnicas más utilizadas y prometedoras son las extensiones del método Non-negative Matrix Factorization (NMF). Sin embargo, ninguna de las extensiones de NMF ha abordado simultáneamente la integración de múltiples entradas provenientes de diferentes fuentes y las relaciones no lineales inherentes a procesos biológicos. En este artículo, proponemos un enfoque NMF basado en kernels que tiene como objetivo integrar múltiples entradas provenientes de dos fuentes diferentes, incluido el conocimiento previo y las relaciones no lineales. La técnica kernelizada propuesta y la no kernelizada fueron implementadas y probadas con información de adenocarcinoma de pulmón (LUAD) de tres perfiles ómicos diferentes provenientes de una fuente de datos experimental y observacional. El desempeño de los métodos fue evaluado y contrastado usando coeficiente cofenético, AUC y puntaje biológico. Los resultados muestran que la técnica kernelizada supera en gran medida el rendimiento de la estándar, en todas las métricas. El método propuesto permite identificar co-módulos de moléculas enriquecidos en vías estrechamente relacionadas con la aparición y progresión del cáncer de pulmón. Asimismo, el análisis de co-módulos enriquecidos y sus rutas relevantes permite identificar genes y reguladores de genes con un papel clave en la tumorigénesis pulmonar y proponerlos como posibles biomarcadores.Magíster en Ingeniería IndustrialMaestría27 hojasapplication/pdfengUniversidad de los AndesMaestría en Ingeniería IndustrialFacultad de IngenieríaDepartamento de Ingeniería IndustrialMulti-omic data integration using kernel-based non-negative matrix factorization approach to identify and analyze co-modules in lung adenocarcinomaTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesishttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TMAprendizaje automático (Inteligencia artificial)Enfermedades de los pulmonesAdenocarcinomaPulmonesSíndrome de dificultad respiratoriaFunciones de KernelIngeniería201415277Publicationhttps://scholar.google.es/citations?user=vPH5LywAAAAJvirtual::3033-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000271403virtual::3033-1e1de19e8-629e-401d-a9d3-77eea3d2db48virtual::3033-1e1de19e8-629e-401d-a9d3-77eea3d2db48virtual::3033-1ORIGINAL22787.pdfapplication/pdf2196370https://repositorio.uniandes.edu.co/bitstreams/8cf4d458-4d7f-4203-b70c-d080de0804c4/downloada6a3d21b3785c9b737806c2b688e83f3MD51TEXT22787.pdf.txt22787.pdf.txtExtracted texttext/plain98109https://repositorio.uniandes.edu.co/bitstreams/5a4dc803-acf1-43c3-afad-516111ed2c11/downloadc3dc09fc5ee509e96a8fdb48563d2740MD54THUMBNAIL22787.pdf.jpg22787.pdf.jpgIM Thumbnailimage/jpeg26187https://repositorio.uniandes.edu.co/bitstreams/65068560-0cc8-4e7d-b7db-303e1b5fa135/downloade2756b9a8cfaaf5815709d0bd7df8015MD551992/51008oai:repositorio.uniandes.edu.co:1992/510082024-03-13 12:20:25.546https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdfopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co