Multi-GPU distribution of single-batch, time-dependent linear products

Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a s...

Full description

Autores:: Margffoy Tuay, Edgar Andrés

Tipo de recurso:

Fecha de publicación:: 2020

Institución:: Universidad de los Andes

Repositorio:: Séneca: repositorio Uniandes

Idioma:: eng

id	UNIANDES2_29ad920a3f74d937c0b6d1afdc81b55c
oai_identifier_str	oai:repositorio.uniandes.edu.co:1992/48619
network_acronym_str	UNIANDES2
network_name_str	Séneca: repositorio Uniandes
repository_id_str
dc.title.es_CO.fl_str_mv	Multi-GPU distribution of single-batch, time-dependent linear products
title	Multi-GPU distribution of single-batch, time-dependent linear products
spellingShingle	Multi-GPU distribution of single-batch, time-dependent linear products Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial) Ingeniería
title_short	Multi-GPU distribution of single-batch, time-dependent linear products
title_full	Multi-GPU distribution of single-batch, time-dependent linear products
title_fullStr	Multi-GPU distribution of single-batch, time-dependent linear products
title_full_unstemmed	Multi-GPU distribution of single-batch, time-dependent linear products
title_sort	Multi-GPU distribution of single-batch, time-dependent linear products
dc.creator.fl_str_mv	Margffoy Tuay, Edgar Andrés
dc.contributor.advisor.none.fl_str_mv	Cardozo Álvarez, Nicolás Arbeláez Escalante, Pablo Andrés
dc.contributor.author.none.fl_str_mv	Margffoy Tuay, Edgar Andrés
dc.contributor.jury.none.fl_str_mv	Castro Barrera, Harold Enrique
dc.subject.armarc.es_CO.fl_str_mv	Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial)
topic	Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial) Ingeniería
dc.subject.themes.none.fl_str_mv	Ingeniería
description	Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a single node. However, that does not hold when dealing with large inputs or, when GPU infrastructure does not have enough memory. In this paper, we propose a new operator-level distribution approach, tailored to the aforementioned cases in which, we distribute a single input of data across multiple GPU nodes, taking into account the operators involved in a given model. By distributing the original input, we are able to reduce the space complexity of each node, thus enabling multiple GPUs to process inputs that could not fit in a single node. We validate our approach by distributing the dot product attention, a fundamental operation in modern sequence-to-sequence architectures
publishDate	2020
dc.date.issued.es_CO.fl_str_mv	2020
dc.date.accessioned.none.fl_str_mv	2021-02-18T12:25:02Z
dc.date.available.none.fl_str_mv	2021-02-18T12:25:02Z
dc.type.spa.fl_str_mv	Trabajo de grado - Maestría
dc.type.coarversion.fl_str_mv	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/masterThesis
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TM
dc.identifier.uri.none.fl_str_mv	http://hdl.handle.net/1992/48619
dc.identifier.pdf.none.fl_str_mv	u833168.pdf
dc.identifier.instname.spa.fl_str_mv	instname:Universidad de los Andes
dc.identifier.reponame.spa.fl_str_mv	reponame:Repositorio Institucional Séneca
dc.identifier.repourl.spa.fl_str_mv	repourl:https://repositorio.uniandes.edu.co/
url	http://hdl.handle.net/1992/48619
identifier_str_mv	u833168.pdf instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.es_CO.fl_str_mv	eng
language	eng
dc.rights.uri.*.fl_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.extent.es_CO.fl_str_mv	37 hojas
dc.format.mimetype.es_CO.fl_str_mv	application/pdf
dc.publisher.es_CO.fl_str_mv	Universidad de los Andes
dc.publisher.program.es_CO.fl_str_mv	Maestría en Ingeniería de Sistemas y Computación
dc.publisher.faculty.es_CO.fl_str_mv	Facultad de Ingeniería
dc.publisher.department.es_CO.fl_str_mv	Departamento de Ingeniería de Sistemas y Computación
dc.source.es_CO.fl_str_mv	instname:Universidad de los Andes reponame:Repositorio Institucional Séneca
instname_str	Universidad de los Andes
institution	Universidad de los Andes
reponame_str	Repositorio Institucional Séneca
collection	Repositorio Institucional Séneca
bitstream.url.fl_str_mv	https://repositorio.uniandes.edu.co/bitstreams/95d81d46-243e-4677-86ee-a6b26c183b51/download https://repositorio.uniandes.edu.co/bitstreams/5bd198a7-da98-4804-bcec-52b769a81b26/download https://repositorio.uniandes.edu.co/bitstreams/fe2716db-e621-42ce-9592-1edb7e03914f/download
bitstream.checksum.fl_str_mv	0239cf97b9480dc135dc8d67b5904141 1f0ea1b7612df11a55559afc830f6936 5f43072fddafb6fd4992ce4eb3feccd6
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio institucional Séneca
repository.mail.fl_str_mv	adminrepositorio@uniandes.edu.co
_version_	1837005446241583104
spelling	Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Cardozo Álvarez, Nicolásvirtual::16716-1Arbeláez Escalante, Pablo Andrésvirtual::16717-1Margffoy Tuay, Edgar Andrés45ac0bb0-56d2-4f65-93bd-438dad6dad49500Castro Barrera, Harold Enrique2021-02-18T12:25:02Z2021-02-18T12:25:02Z2020http://hdl.handle.net/1992/48619u833168.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a single node. However, that does not hold when dealing with large inputs or, when GPU infrastructure does not have enough memory. In this paper, we propose a new operator-level distribution approach, tailored to the aforementioned cases in which, we distribute a single input of data across multiple GPU nodes, taking into account the operators involved in a given model. By distributing the original input, we are able to reduce the space complexity of each node, thus enabling multiple GPUs to process inputs that could not fit in a single node. We validate our approach by distributing the dot product attention, a fundamental operation in modern sequence-to-sequence architecturesLos enfoques tradicionales al entrenamiento distribuidos de aprendizaje profundo parten del principio que al menos una instancia de entrada cabe en la memoria de un solo nodo CPU/GPU. Sin embargo, fallan al momento en el que la entrada no cabe en memoria, debido al tamaño del modelo o la misma entrada. En este trabajo, se propone un nuevo enfoque para distribuir modelos de aprendizaje profundo, basado en la distribución de operadores, la cual consiste en realizar una partición de la entrada, la cual se distribuye a través de múltiples GPUs, teniendo en cuenta los operadores involucrados. El paradigma propuesto habilita el entrenamiento de modelos que cuentan con restricciones de espacio. Validamos la propuesta al distribuir los productos lineales involucrados en la atención por producto punto, una operación fundamental en las arquitecturas modernas de sequencia a sequenciaMagíster en Ingeniería de Sistemas y ComputaciónMaestría37 hojasapplication/pdfengUniversidad de los AndesMaestría en Ingeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería de Sistemas y Computacióninstname:Universidad de los Andesreponame:Repositorio Institucional SénecaMulti-GPU distribution of single-batch, time-dependent linear productsTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesishttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TMUnidades de procesamiento gráficoAprendizaje automático (Inteligencia artificial)IngenieríaPublicationhttps://scholar.google.es/citations?user=3iTzjQsAAAAJvirtual::16716-1https://scholar.google.es/citations?user=k0nZO90AAAAJvirtual::16717-10000-0002-1094-9952virtual::16716-10000-0001-5244-2407virtual::16717-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001579086virtual::16717-1a77ff528-fc33-44d6-9022-814f81ef407avirtual::16716-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::16717-1a77ff528-fc33-44d6-9022-814f81ef407avirtual::16716-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::16717-1ORIGINALu833168.pdfapplication/pdf2364134https://repositorio.uniandes.edu.co/bitstreams/95d81d46-243e-4677-86ee-a6b26c183b51/download0239cf97b9480dc135dc8d67b5904141MD51TEXTu833168.pdf.txtu833168.pdf.txtExtracted texttext/plain85797https://repositorio.uniandes.edu.co/bitstreams/5bd198a7-da98-4804-bcec-52b769a81b26/download1f0ea1b7612df11a55559afc830f6936MD54THUMBNAILu833168.pdf.jpgu833168.pdf.jpgIM Thumbnailimage/jpeg7231https://repositorio.uniandes.edu.co/bitstreams/fe2716db-e621-42ce-9592-1edb7e03914f/download5f43072fddafb6fd4992ce4eb3feccd6MD551992/48619oai:repositorio.uniandes.edu.co:1992/486192024-03-13 15:48:03.616http://creativecommons.org/licenses/by-nc-nd/4.0/open.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co

Multi-GPU distribution of single-batch, time-dependent linear products

Publicaciones similares