Multi-GPU distribution of single-batch, time-dependent linear products
Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a s...
- Autores:
-
Margffoy Tuay, Edgar Andrés
- Tipo de recurso:
- Fecha de publicación:
- 2020
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/48619
- Acceso en línea:
- http://hdl.handle.net/1992/48619
- Palabra clave:
- Unidades de procesamiento gráfico
Aprendizaje automático (Inteligencia artificial)
Ingeniería
- Rights
- openAccess
- License
- http://creativecommons.org/licenses/by-nc-nd/4.0/
id |
UNIANDES2_29ad920a3f74d937c0b6d1afdc81b55c |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/48619 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.es_CO.fl_str_mv |
Multi-GPU distribution of single-batch, time-dependent linear products |
title |
Multi-GPU distribution of single-batch, time-dependent linear products |
spellingShingle |
Multi-GPU distribution of single-batch, time-dependent linear products Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial) Ingeniería |
title_short |
Multi-GPU distribution of single-batch, time-dependent linear products |
title_full |
Multi-GPU distribution of single-batch, time-dependent linear products |
title_fullStr |
Multi-GPU distribution of single-batch, time-dependent linear products |
title_full_unstemmed |
Multi-GPU distribution of single-batch, time-dependent linear products |
title_sort |
Multi-GPU distribution of single-batch, time-dependent linear products |
dc.creator.fl_str_mv |
Margffoy Tuay, Edgar Andrés |
dc.contributor.advisor.none.fl_str_mv |
Cardozo Álvarez, Nicolás Arbeláez Escalante, Pablo Andrés |
dc.contributor.author.none.fl_str_mv |
Margffoy Tuay, Edgar Andrés |
dc.contributor.jury.none.fl_str_mv |
Castro Barrera, Harold Enrique |
dc.subject.armarc.es_CO.fl_str_mv |
Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial) |
topic |
Unidades de procesamiento gráfico Aprendizaje automático (Inteligencia artificial) Ingeniería |
dc.subject.themes.none.fl_str_mv |
Ingeniería |
description |
Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a single node. However, that does not hold when dealing with large inputs or, when GPU infrastructure does not have enough memory. In this paper, we propose a new operator-level distribution approach, tailored to the aforementioned cases in which, we distribute a single input of data across multiple GPU nodes, taking into account the operators involved in a given model. By distributing the original input, we are able to reduce the space complexity of each node, thus enabling multiple GPUs to process inputs that could not fit in a single node. We validate our approach by distributing the dot product attention, a fundamental operation in modern sequence-to-sequence architectures |
publishDate |
2020 |
dc.date.issued.es_CO.fl_str_mv |
2020 |
dc.date.accessioned.none.fl_str_mv |
2021-02-18T12:25:02Z |
dc.date.available.none.fl_str_mv |
2021-02-18T12:25:02Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Maestría |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/masterThesis |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TM |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/1992/48619 |
dc.identifier.pdf.none.fl_str_mv |
u833168.pdf |
dc.identifier.instname.spa.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.spa.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.spa.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
http://hdl.handle.net/1992/48619 |
identifier_str_mv |
u833168.pdf instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.es_CO.fl_str_mv |
eng |
language |
eng |
dc.rights.uri.*.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.es_CO.fl_str_mv |
37 hojas |
dc.format.mimetype.es_CO.fl_str_mv |
application/pdf |
dc.publisher.es_CO.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.es_CO.fl_str_mv |
Maestría en Ingeniería de Sistemas y Computación |
dc.publisher.faculty.es_CO.fl_str_mv |
Facultad de Ingeniería |
dc.publisher.department.es_CO.fl_str_mv |
Departamento de Ingeniería de Sistemas y Computación |
dc.source.es_CO.fl_str_mv |
instname:Universidad de los Andes reponame:Repositorio Institucional Séneca |
instname_str |
Universidad de los Andes |
institution |
Universidad de los Andes |
reponame_str |
Repositorio Institucional Séneca |
collection |
Repositorio Institucional Séneca |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/95d81d46-243e-4677-86ee-a6b26c183b51/download https://repositorio.uniandes.edu.co/bitstreams/5bd198a7-da98-4804-bcec-52b769a81b26/download https://repositorio.uniandes.edu.co/bitstreams/fe2716db-e621-42ce-9592-1edb7e03914f/download |
bitstream.checksum.fl_str_mv |
0239cf97b9480dc135dc8d67b5904141 1f0ea1b7612df11a55559afc830f6936 5f43072fddafb6fd4992ce4eb3feccd6 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1812134064902111232 |
spelling |
Al consultar y hacer uso de este recurso, está aceptando las condiciones de uso establecidas por los autores.http://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Cardozo Álvarez, Nicolásvirtual::16716-1Arbeláez Escalante, Pablo Andrésvirtual::16717-1Margffoy Tuay, Edgar Andrés45ac0bb0-56d2-4f65-93bd-438dad6dad49500Castro Barrera, Harold Enrique2021-02-18T12:25:02Z2021-02-18T12:25:02Z2020http://hdl.handle.net/1992/48619u833168.pdfinstname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Modern approaches to distributed deep learning focus on using more GPU nodes to process more data in parallel, updating the model weights using a distributed gradient update rule across all nodes. The main limitation of this paradigm is that it assumes that at least one sample of data can fit in a single node. However, that does not hold when dealing with large inputs or, when GPU infrastructure does not have enough memory. In this paper, we propose a new operator-level distribution approach, tailored to the aforementioned cases in which, we distribute a single input of data across multiple GPU nodes, taking into account the operators involved in a given model. By distributing the original input, we are able to reduce the space complexity of each node, thus enabling multiple GPUs to process inputs that could not fit in a single node. We validate our approach by distributing the dot product attention, a fundamental operation in modern sequence-to-sequence architecturesLos enfoques tradicionales al entrenamiento distribuidos de aprendizaje profundo parten del principio que al menos una instancia de entrada cabe en la memoria de un solo nodo CPU/GPU. Sin embargo, fallan al momento en el que la entrada no cabe en memoria, debido al tamaño del modelo o la misma entrada. En este trabajo, se propone un nuevo enfoque para distribuir modelos de aprendizaje profundo, basado en la distribución de operadores, la cual consiste en realizar una partición de la entrada, la cual se distribuye a través de múltiples GPUs, teniendo en cuenta los operadores involucrados. El paradigma propuesto habilita el entrenamiento de modelos que cuentan con restricciones de espacio. Validamos la propuesta al distribuir los productos lineales involucrados en la atención por producto punto, una operación fundamental en las arquitecturas modernas de sequencia a sequenciaMagíster en Ingeniería de Sistemas y ComputaciónMaestría37 hojasapplication/pdfengUniversidad de los AndesMaestría en Ingeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería de Sistemas y Computacióninstname:Universidad de los Andesreponame:Repositorio Institucional SénecaMulti-GPU distribution of single-batch, time-dependent linear productsTrabajo de grado - Maestríainfo:eu-repo/semantics/masterThesishttp://purl.org/coar/version/c_970fb48d4fbd8a85Texthttp://purl.org/redcol/resource_type/TMUnidades de procesamiento gráficoAprendizaje automático (Inteligencia artificial)IngenieríaPublicationhttps://scholar.google.es/citations?user=3iTzjQsAAAAJvirtual::16716-1https://scholar.google.es/citations?user=k0nZO90AAAAJvirtual::16717-10000-0002-1094-9952virtual::16716-10000-0001-5244-2407virtual::16717-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001579086virtual::16717-1a77ff528-fc33-44d6-9022-814f81ef407avirtual::16716-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::16717-1a77ff528-fc33-44d6-9022-814f81ef407avirtual::16716-1b4f52d42-ce2a-4e74-a22f-e52a6bfbd48evirtual::16717-1ORIGINALu833168.pdfapplication/pdf2364134https://repositorio.uniandes.edu.co/bitstreams/95d81d46-243e-4677-86ee-a6b26c183b51/download0239cf97b9480dc135dc8d67b5904141MD51TEXTu833168.pdf.txtu833168.pdf.txtExtracted texttext/plain85797https://repositorio.uniandes.edu.co/bitstreams/5bd198a7-da98-4804-bcec-52b769a81b26/download1f0ea1b7612df11a55559afc830f6936MD54THUMBNAILu833168.pdf.jpgu833168.pdf.jpgIM Thumbnailimage/jpeg7231https://repositorio.uniandes.edu.co/bitstreams/fe2716db-e621-42ce-9592-1edb7e03914f/download5f43072fddafb6fd4992ce4eb3feccd6MD551992/48619oai:repositorio.uniandes.edu.co:1992/486192024-03-13 15:48:03.616http://creativecommons.org/licenses/by-nc-nd/4.0/open.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.co |