Inverse reinforcement learning via stochastic mirror descent
Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework...
- Autores:
-
Leiva Montoya, Esteban
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2025
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/75575
- Acceso en línea:
- https://hdl.handle.net/1992/75575
- Palabra clave:
- Inverse optimization
Inverse reinforcement learning
Stochastic mirror descent
Markov decision processes
Matemáticas
- Rights
- openAccess
- License
- Attribution 4.0 International
id |
UNIANDES2_5b72282883b469aa03b761779f905540 |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/75575 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.eng.fl_str_mv |
Inverse reinforcement learning via stochastic mirror descent |
title |
Inverse reinforcement learning via stochastic mirror descent |
spellingShingle |
Inverse reinforcement learning via stochastic mirror descent Inverse optimization Inverse reinforcement learning Stochastic mirror descent Markov decision processes Matemáticas |
title_short |
Inverse reinforcement learning via stochastic mirror descent |
title_full |
Inverse reinforcement learning via stochastic mirror descent |
title_fullStr |
Inverse reinforcement learning via stochastic mirror descent |
title_full_unstemmed |
Inverse reinforcement learning via stochastic mirror descent |
title_sort |
Inverse reinforcement learning via stochastic mirror descent |
dc.creator.fl_str_mv |
Leiva Montoya, Esteban |
dc.contributor.advisor.none.fl_str_mv |
Junca Peláez, Mauricio José |
dc.contributor.author.none.fl_str_mv |
Leiva Montoya, Esteban |
dc.contributor.jury.none.fl_str_mv |
Pagnoncelli, Bernardo |
dc.subject.keyword.eng.fl_str_mv |
Inverse optimization |
topic |
Inverse optimization Inverse reinforcement learning Stochastic mirror descent Markov decision processes Matemáticas |
dc.subject.keyword.none.fl_str_mv |
Inverse reinforcement learning Stochastic mirror descent Markov decision processes |
dc.subject.themes.spa.fl_str_mv |
Matemáticas |
description |
Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework for Markov decision processes (MDPs) and the apprenticeship learning formalism, showing that both approaches can be unified under a shared structure. We formulate IRL and AL as regularized min-max problems and develop an algorithm that leverages stochastic mirror descent (SMD) that offers theoretical guarantees on convergence. |
publishDate |
2025 |
dc.date.accessioned.none.fl_str_mv |
2025-01-22T18:03:35Z |
dc.date.available.none.fl_str_mv |
2025-01-22T18:03:35Z |
dc.date.issued.none.fl_str_mv |
2025-01-22 |
dc.type.none.fl_str_mv |
Trabajo de grado - Pregrado |
dc.type.driver.none.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
dc.type.version.none.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.none.fl_str_mv |
http://purl.org/coar/resource_type/c_7a1f |
dc.type.content.none.fl_str_mv |
Text |
dc.type.redcol.none.fl_str_mv |
http://purl.org/redcol/resource_type/TP |
format |
http://purl.org/coar/resource_type/c_7a1f |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/1992/75575 |
dc.identifier.instname.none.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.none.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.none.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
https://hdl.handle.net/1992/75575 |
identifier_str_mv |
instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.rights.en.fl_str_mv |
Attribution 4.0 International |
dc.rights.uri.none.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
dc.rights.accessrights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.none.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.none.fl_str_mv |
43 páginas |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.none.fl_str_mv |
Matemáticas |
dc.publisher.faculty.none.fl_str_mv |
Facultad de Ciencias |
dc.publisher.department.none.fl_str_mv |
Departamento de Matemáticas |
publisher.none.fl_str_mv |
Universidad de los Andes |
institution |
Universidad de los Andes |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/b1febb1c-0b20-4e08-9060-3e38373648ac/download https://repositorio.uniandes.edu.co/bitstreams/3880c5e0-0834-4be4-9ee4-144486ed21e8/download https://repositorio.uniandes.edu.co/bitstreams/94abf9b6-1be4-4239-9a0f-eed1a1b8b3de/download https://repositorio.uniandes.edu.co/bitstreams/563e215d-3dd6-4b14-b72b-7f0c6e274d57/download https://repositorio.uniandes.edu.co/bitstreams/206cbdc3-ae30-41a3-b4e7-e287382d9b7e/download https://repositorio.uniandes.edu.co/bitstreams/e813edf9-90f6-4f0d-9bd1-3062e3351504/download https://repositorio.uniandes.edu.co/bitstreams/73b91d69-89cb-47f5-887e-4e1bdbca33a8/download https://repositorio.uniandes.edu.co/bitstreams/4680c23d-926e-4606-ac73-1cc63287ba3b/download |
bitstream.checksum.fl_str_mv |
b2576d5e4c86974ebd8103345911fcdb 73fd6cb98afbaa542496e81d5b179827 0175ea4a2d4caec4bbcc37e300941108 ae9e573a68e7f92501b6913cc846c39f 8811b3e671dbb465687d086cd7d6e316 602fcc4491f827adfff1670dcba608df d4a27f482e2efc66864648c852d61f3d 895776b7989fe0019cd64f89fab8637b |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1831927749737971712 |
spelling |
Junca Peláez, Mauricio Josévirtual::22365-1Leiva Montoya, EstebanPagnoncelli, Bernardo2025-01-22T18:03:35Z2025-01-22T18:03:35Z2025-01-22https://hdl.handle.net/1992/75575instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL) are foundational problems in decision-making under uncertainty, where the goal is to infer cost functions and policies from observed behavior. In this thesis, we establish the equivalence between the inverse optimization framework for Markov decision processes (MDPs) and the apprenticeship learning formalism, showing that both approaches can be unified under a shared structure. We formulate IRL and AL as regularized min-max problems and develop an algorithm that leverages stochastic mirror descent (SMD) that offers theoretical guarantees on convergence.Pregrado43 páginasapplication/pdfengUniversidad de los AndesMatemáticasFacultad de CienciasDepartamento de MatemáticasAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Inverse reinforcement learning via stochastic mirror descentTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPInverse optimizationInverse reinforcement learningStochastic mirror descentMarkov decision processesMatemáticas202021368Publicationhttps://scholar.google.es/citations?user=CoIlxH0AAAAJvirtual::22365-10000-0002-5541-0758virtual::22365-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0000155861virtual::22365-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::22365-11e5c3dc6-4d9c-406b-9f99-5c91523b7e49virtual::22365-1ORIGINALautorizacion tesis - Repositorio Inst 1.pdfautorizacion tesis - Repositorio Inst 1.pdfHIDEapplication/pdf282345https://repositorio.uniandes.edu.co/bitstreams/b1febb1c-0b20-4e08-9060-3e38373648ac/downloadb2576d5e4c86974ebd8103345911fcdbMD51Inverse reinforcement learning via stochastic mirror descent.pdfInverse reinforcement learning via stochastic mirror descent.pdfapplication/pdf711833https://repositorio.uniandes.edu.co/bitstreams/3880c5e0-0834-4be4-9ee4-144486ed21e8/download73fd6cb98afbaa542496e81d5b179827MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8908https://repositorio.uniandes.edu.co/bitstreams/94abf9b6-1be4-4239-9a0f-eed1a1b8b3de/download0175ea4a2d4caec4bbcc37e300941108MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/563e215d-3dd6-4b14-b72b-7f0c6e274d57/downloadae9e573a68e7f92501b6913cc846c39fMD54TEXTautorizacion tesis - Repositorio Inst 1.pdf.txtautorizacion tesis - Repositorio Inst 1.pdf.txtExtracted texttext/plain2002https://repositorio.uniandes.edu.co/bitstreams/206cbdc3-ae30-41a3-b4e7-e287382d9b7e/download8811b3e671dbb465687d086cd7d6e316MD55Inverse reinforcement learning via stochastic mirror descent.pdf.txtInverse reinforcement learning via stochastic mirror descent.pdf.txtExtracted texttext/plain83663https://repositorio.uniandes.edu.co/bitstreams/e813edf9-90f6-4f0d-9bd1-3062e3351504/download602fcc4491f827adfff1670dcba608dfMD57THUMBNAILautorizacion tesis - Repositorio Inst 1.pdf.jpgautorizacion tesis - Repositorio Inst 1.pdf.jpgGenerated Thumbnailimage/jpeg10846https://repositorio.uniandes.edu.co/bitstreams/73b91d69-89cb-47f5-887e-4e1bdbca33a8/downloadd4a27f482e2efc66864648c852d61f3dMD56Inverse reinforcement learning via stochastic mirror descent.pdf.jpgInverse reinforcement learning via stochastic mirror descent.pdf.jpgGenerated Thumbnailimage/jpeg6507https://repositorio.uniandes.edu.co/bitstreams/4680c23d-926e-4606-ac73-1cc63287ba3b/download895776b7989fe0019cd64f89fab8637bMD581992/75575oai:repositorio.uniandes.edu.co:1992/755752025-03-05 09:40:38.349http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K |