Inverse reinforcement learning via stochastic mirror descent

Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework...

Full description

Autores:
Leiva Montoya, Esteban
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2025
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/75575
Acceso en línea:
https://hdl.handle.net/1992/75575
Palabra clave:
Inverse optimization
Inverse reinforcement learning
Stochastic mirror descent
Markov decision processes
Matemáticas
Rights
openAccess
License
Attribution 4.0 International
id UNIANDES2_5b72282883b469aa03b761779f905540
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/75575
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.eng.fl_str_mv Inverse reinforcement learning via stochastic mirror descent
title Inverse reinforcement learning via stochastic mirror descent
spellingShingle Inverse reinforcement learning via stochastic mirror descent
Inverse optimization
Inverse reinforcement learning
Stochastic mirror descent
Markov decision processes
Matemáticas
title_short Inverse reinforcement learning via stochastic mirror descent
title_full Inverse reinforcement learning via stochastic mirror descent
title_fullStr Inverse reinforcement learning via stochastic mirror descent
title_full_unstemmed Inverse reinforcement learning via stochastic mirror descent
title_sort Inverse reinforcement learning via stochastic mirror descent
dc.creator.fl_str_mv Leiva Montoya, Esteban
dc.contributor.advisor.none.fl_str_mv Junca Peláez, Mauricio José
dc.contributor.author.none.fl_str_mv Leiva Montoya, Esteban
dc.contributor.jury.none.fl_str_mv Pagnoncelli, Bernardo
dc.subject.keyword.eng.fl_str_mv Inverse optimization
topic Inverse optimization
Inverse reinforcement learning
Stochastic mirror descent
Markov decision processes
Matemáticas
dc.subject.keyword.none.fl_str_mv Inverse reinforcement learning
Stochastic mirror descent
Markov decision processes
dc.subject.themes.spa.fl_str_mv Matemáticas
description Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework for Markov decision processes (MDPs) and the apprenticeship learning formalism, showing that both approaches can be unified under a shared structure. We formulate IRL and AL as regularized min-max problems and develop an algorithm that leverages stochastic mirror descent (SMD) that offers theoretical guarantees on convergence.
publishDate 2025
dc.date.accessioned.none.fl_str_mv 2025-01-22T18:03:35Z
dc.date.available.none.fl_str_mv 2025-01-22T18:03:35Z
dc.date.issued.none.fl_str_mv 2025-01-22
dc.type.none.fl_str_mv Trabajo de grado - Pregrado
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/bachelorThesis
dc.type.version.none.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_7a1f
dc.type.content.none.fl_str_mv Text
dc.type.redcol.none.fl_str_mv http://purl.org/redcol/resource_type/TP
format http://purl.org/coar/resource_type/c_7a1f
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/1992/75575
dc.identifier.instname.none.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.none.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.none.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url https://hdl.handle.net/1992/75575
identifier_str_mv instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.rights.en.fl_str_mv Attribution 4.0 International
dc.rights.uri.none.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.accessrights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.none.fl_str_mv 43 páginas
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de los Andes
dc.publisher.program.none.fl_str_mv Matemáticas
dc.publisher.faculty.none.fl_str_mv Facultad de Ciencias
dc.publisher.department.none.fl_str_mv Departamento de Matemáticas
publisher.none.fl_str_mv Universidad de los Andes
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/b1febb1c-0b20-4e08-9060-3e38373648ac/download
https://repositorio.uniandes.edu.co/bitstreams/3880c5e0-0834-4be4-9ee4-144486ed21e8/download
https://repositorio.uniandes.edu.co/bitstreams/94abf9b6-1be4-4239-9a0f-eed1a1b8b3de/download
https://repositorio.uniandes.edu.co/bitstreams/563e215d-3dd6-4b14-b72b-7f0c6e274d57/download
https://repositorio.uniandes.edu.co/bitstreams/206cbdc3-ae30-41a3-b4e7-e287382d9b7e/download
https://repositorio.uniandes.edu.co/bitstreams/e813edf9-90f6-4f0d-9bd1-3062e3351504/download
https://repositorio.uniandes.edu.co/bitstreams/73b91d69-89cb-47f5-887e-4e1bdbca33a8/download
https://repositorio.uniandes.edu.co/bitstreams/4680c23d-926e-4606-ac73-1cc63287ba3b/download
bitstream.checksum.fl_str_mv b2576d5e4c86974ebd8103345911fcdb
73fd6cb98afbaa542496e81d5b179827
0175ea4a2d4caec4bbcc37e300941108
ae9e573a68e7f92501b6913cc846c39f
8811b3e671dbb465687d086cd7d6e316
602fcc4491f827adfff1670dcba608df
d4a27f482e2efc66864648c852d61f3d
895776b7989fe0019cd64f89fab8637b
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1831927749737971712
spelling Junca Peláez, Mauricio Josévirtual::22365-1Leiva Montoya, EstebanPagnoncelli, Bernardo2025-01-22T18:03:35Z2025-01-22T18:03:35Z2025-01-22https://hdl.handle.net/1992/75575instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework for Markov decision processes (MDPs) and the apprenticeship learning formalism, showing that both approaches can be unified under a shared structure. We formulate IRL and AL as regularized min-max problems and develop an algorithm that leverages stochastic mirror descent (SMD) that offers theoretical guarantees on convergence.Pregrado43 páginasapplication/pdfengUniversidad de los AndesMatemáticasFacultad de CienciasDepartamento de MatemáticasAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Inverse reinforcement learning via stochastic mirror descentTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPInverse optimizationInverse reinforcement learningStochastic mirror descentMarkov decision processesMatemáticas202021368Publicationhttps://scholar.google.es/citations?user=CoIlxH0AAAAJvirtual::22365-10000-0002-5541-0758virtual::22365-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000155861virtual::22365-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::22365-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::22365-1ORIGINALautorizacion tesis - Repositorio Inst 1.pdfautorizacion tesis - Repositorio Inst 1.pdfHIDEapplication/pdf282345https://repositorio.uniandes.edu.co/bitstreams/b1febb1c-0b20-4e08-9060-3e38373648ac/downloadb2576d5e4c86974ebd8103345911fcdbMD51Inverse reinforcement learning via stochastic mirror descent.pdfInverse reinforcement learning via stochastic mirror descent.pdfapplication/pdf711833https://repositorio.uniandes.edu.co/bitstreams/3880c5e0-0834-4be4-9ee4-144486ed21e8/download73fd6cb98afbaa542496e81d5b179827MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8908https://repositorio.uniandes.edu.co/bitstreams/94abf9b6-1be4-4239-9a0f-eed1a1b8b3de/download0175ea4a2d4caec4bbcc37e300941108MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/563e215d-3dd6-4b14-b72b-7f0c6e274d57/downloadae9e573a68e7f92501b6913cc846c39fMD54TEXTautorizacion tesis - Repositorio Inst 1.pdf.txtautorizacion tesis - Repositorio Inst 1.pdf.txtExtracted texttext/plain2002https://repositorio.uniandes.edu.co/bitstreams/206cbdc3-ae30-41a3-b4e7-e287382d9b7e/download8811b3e671dbb465687d086cd7d6e316MD55Inverse reinforcement learning via stochastic mirror descent.pdf.txtInverse reinforcement learning via stochastic mirror descent.pdf.txtExtracted texttext/plain83663https://repositorio.uniandes.edu.co/bitstreams/e813edf9-90f6-4f0d-9bd1-3062e3351504/download602fcc4491f827adfff1670dcba608dfMD57THUMBNAILautorizacion tesis - Repositorio Inst 1.pdf.jpgautorizacion tesis - Repositorio Inst 1.pdf.jpgGenerated Thumbnailimage/jpeg10846https://repositorio.uniandes.edu.co/bitstreams/73b91d69-89cb-47f5-887e-4e1bdbca33a8/downloadd4a27f482e2efc66864648c852d61f3dMD56Inverse reinforcement learning via stochastic mirror descent.pdf.jpgInverse reinforcement learning via stochastic mirror descent.pdf.jpgGenerated Thumbnailimage/jpeg6507https://repositorio.uniandes.edu.co/bitstreams/4680c23d-926e-4606-ac73-1cc63287ba3b/download895776b7989fe0019cd64f89fab8637bMD581992/75575oai:repositorio.uniandes.edu.co:1992/755752025-03-05 09:40:38.349http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K