Machine learning-based cancer classification using gene expression data

In this Masters thesis we explore some machine and deep learning algorithms to classify different types of cancer-based on the gene expression profile of each sample. We use expression profiles of both cancer tissue and normal tissue to train the predictive models. The abnormal tissue samples were o...

Full description

Autores:
Martínez Logreira, Julián Alexander
Tipo de recurso:
Fecha de publicación:
2020
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/50946
Acceso en línea:
http://hdl.handle.net/1992/50946
Palabra clave:
Cáncer
Genómica
Simulación por computadores
Aprendizaje automático (Inteligencia artificial)
Ingeniería
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
id UNIANDES2_23a06836a6253a07a65b014290bc8443
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/50946
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
spelling Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Bloch Morel, Natasha Ivonne0281f4f5-6cd5-47a0-a32e-12103352be78400Arbeláez Escalante, Pablo Andrésvirtual::13754-1Martínez Logreira, Julián Alexanderaaf335ec-6c82-4dfa-a0a9-ac0f87fa74b3400Valderrama Manrique, Mario AndrésReyes, Alejandro2021-08-10T18:04:38Z2021-08-10T18:04:38Z2020http://hdl.handle.net/1992/5094623632.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/In this Masters thesis we explore some machine and deep learning algorithms to classify different types of cancer-based on the gene expression profile of each sample. We use expression profiles of both cancer tissue and normal tissue to train the predictive models. The abnormal tissue samples were obtained from The Cancer Genome Atlas (TCGA) and pair with control (normal) tissue samples from The Genotype-Tissue Expression project (GTEx), both public databases. We implemented ensembles of classic machine learning algorithms showing an accuracy up to 16% approximately. We also implemented a graph convolutional network (GCN) in which a top performance of 52% accuracy approximately was reached. These results suggest the potential of graph-based algorithms to find underlying patterns on weakly structured data.En esta tesis de maestría exploramos algunos algoritmos de aprendizaje profundo y máquina para clasificar diferentes tipos de cáncer según la expresión genética perfil de cada muestra. Usamos perfiles de expresión de tanto el tejido canceroso como el tejido normal para entrenar el modelos predictivos. Las muestras de tejido anormal fueron obtenido de The Cancer Genome Atlas (TCGA) y emparejar con muestras de tejido de control (normal) de El proyecto Genotype-Tissue Expression (GTEx), ambas bases de datos públicas. Implementamos conjuntos de algoritmos clásicos de aprendizaje automático que muestran un precisión hasta un 16% aproximadamente. Nosotros también implementó una red convolucional gráfica (GCN)en el que un rendimiento superior del 52% de precisión aproximadamente se alcanzó. Estos resultados sugieren la potencial de algoritmos basados en gráficos para encontrar patrones subyacentes en datos estructurados débilmente.Magíster en Ingeniería BiomédicaMaestría12 hojasapplication/pdfengUniversidad de los AndesMaestría en Ingeniería BiomédicaFacultad de IngenieríaDepartamento de Ingeniería BiomédicaMachine learning-based cancer classification using gene expression dataTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesishttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TMCáncerGenómicaSimulación por computadoresAprendizaje automático (Inteligencia artificial)Ingeniería201213994Publicationhttps://scholar.google.es/citations?user=k0nZO90AAAAJvirtual::13754-10000-0001-5244-2407virtual::13754-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001579086virtual::13754-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::13754-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::13754-1TEXT23632.pdf.txt23632.pdf.txtExtracted texttext/plain54220https://repositorio.uniandes.edu.co/bitstreams/962e2ffc-5f21-4c59-8991-e9a86f81d8eb/download7176652c1a6049cd55dc358fadfc9f46MD54ORIGINAL23632.pdfapplication/pdf1479341https://repositorio.uniandes.edu.co/bitstreams/b05c9416-5169-4200-9780-c463d3d5bb5b/downloade888818ff5c8f0ed9a60fc4eb5811801MD51THUMBNAIL23632.pdf.jpg23632.pdf.jpgIM Thumbnailimage/jpeg8196https://repositorio.uniandes.edu.co/bitstreams/f0b21f23-09bb-44ee-98a2-456b1aa67ba1/download22c460aecb9a41de906c3ed4c2751d64MD551992/50946oai:repositorio.uniandes.edu.co:1992/509462024-03-13 15:01:08.528http://creativecommons.org/licenses/by-nc-nd/4.0/open.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co
dc.title.spa.fl_str_mv Machine learning-based cancer classification using gene expression data
title Machine learning-based cancer classification using gene expression data
spellingShingle Machine learning-based cancer classification using gene expression data
Cáncer
Genómica
Simulación por computadores
Aprendizaje automático (Inteligencia artificial)
Ingeniería
title_short Machine learning-based cancer classification using gene expression data
title_full Machine learning-based cancer classification using gene expression data
title_fullStr Machine learning-based cancer classification using gene expression data
title_full_unstemmed Machine learning-based cancer classification using gene expression data
title_sort Machine learning-based cancer classification using gene expression data
dc.creator.fl_str_mv Martínez Logreira, Julián Alexander
dc.contributor.advisor.none.fl_str_mv Bloch Morel, Natasha Ivonne
Arbeláez Escalante, Pablo Andrés
dc.contributor.author.none.fl_str_mv Martínez Logreira, Julián Alexander
dc.contributor.jury.none.fl_str_mv Valderrama Manrique, Mario Andrés
Reyes, Alejandro
dc.subject.armarc.spa.fl_str_mv Cáncer
Genómica
Simulación por computadores
Aprendizaje automático (Inteligencia artificial)
topic Cáncer
Genómica
Simulación por computadores
Aprendizaje automático (Inteligencia artificial)
Ingeniería
dc.subject.themes.none.fl_str_mv Ingeniería
description In this Masters thesis we explore some machine and deep learning algorithms to classify different types of cancer-based on the gene expression profile of each sample. We use expression profiles of both cancer tissue and normal tissue to train the predictive models. The abnormal tissue samples were obtained from The Cancer Genome Atlas (TCGA) and pair with control (normal) tissue samples from The Genotype-Tissue Expression project (GTEx), both public databases. We implemented ensembles of classic machine learning algorithms showing an accuracy up to 16% approximately. We also implemented a graph convolutional network (GCN) in which a top performance of 52% accuracy approximately was reached. These results suggest the potential of graph-based algorithms to find underlying patterns on weakly structured data.
publishDate 2020
dc.date.issued.none.fl_str_mv 2020
dc.date.accessioned.none.fl_str_mv 2021-08-10T18:04:38Z
dc.date.available.none.fl_str_mv 2021-08-10T18:04:38Z
dc.type.spa.fl_str_mv Trabajo de grado - Maestría
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/masterThesis
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TM
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/1992/50946
dc.identifier.pdf.none.fl_str_mv 23632.pdf
dc.identifier.instname.spa.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.spa.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.spa.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url http://hdl.handle.net/1992/50946
identifier_str_mv 23632.pdf
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.rights.uri.*.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.none.fl_str_mv 12 hojas
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de los Andes
dc.publisher.program.none.fl_str_mv Maestría en Ingeniería Biomédica
dc.publisher.faculty.none.fl_str_mv Facultad de Ingeniería
dc.publisher.department.none.fl_str_mv Departamento de Ingeniería Biomédica
publisher.none.fl_str_mv Universidad de los Andes
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/962e2ffc-5f21-4c59-8991-e9a86f81d8eb/download
https://repositorio.uniandes.edu.co/bitstreams/b05c9416-5169-4200-9780-c463d3d5bb5b/download
https://repositorio.uniandes.edu.co/bitstreams/f0b21f23-09bb-44ee-98a2-456b1aa67ba1/download
bitstream.checksum.fl_str_mv 7176652c1a6049cd55dc358fadfc9f46
e888818ff5c8f0ed9a60fc4eb5811801
22c460aecb9a41de906c3ed4c2751d64
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1812134017749745664