Bridging gaps in code generation with large language models

Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-wo...

Full description

Autores:: Osorio Cálad, Juan José

Tipo de recurso:: Trabajo de grado de pregrado

Fecha de publicación:: 2025

Institución:: Universidad de los Andes

Repositorio:: Séneca: repositorio Uniandes

Idioma:: eng

id	UNIANDES2_0f20d45b8b9ad97d0163a65a4ea1185f
oai_identifier_str	oai:repositorio.uniandes.edu.co:1992/75375
network_acronym_str	UNIANDES2
network_name_str	Séneca: repositorio Uniandes
repository_id_str
dc.title.eng.fl_str_mv	Bridging gaps in code generation with large language models
title	Bridging gaps in code generation with large language models
spellingShingle	Bridging gaps in code generation with large language models Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation Ingeniería
title_short	Bridging gaps in code generation with large language models
title_full	Bridging gaps in code generation with large language models
title_fullStr	Bridging gaps in code generation with large language models
title_full_unstemmed	Bridging gaps in code generation with large language models
title_sort	Bridging gaps in code generation with large language models
dc.creator.fl_str_mv	Osorio Cálad, Juan José
dc.contributor.advisor.none.fl_str_mv	Mastropaolo, Antonio Escobar Velasquez, Camilo Andres
dc.contributor.author.none.fl_str_mv	Osorio Cálad, Juan José
dc.subject.keyword.eng.fl_str_mv	Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation
topic	Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation Ingeniería
dc.subject.themes.none.fl_str_mv	Ingeniería
description	Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development.
publishDate	2025
dc.date.accessioned.none.fl_str_mv	2025-01-13T20:09:27Z
dc.date.available.none.fl_str_mv	2025-01-13T20:09:27Z
dc.date.issued.none.fl_str_mv	2025-01-11
dc.type.none.fl_str_mv	Trabajo de grado - Pregrado
dc.type.driver.none.fl_str_mv	info:eu-repo/semantics/bachelorThesis
dc.type.version.none.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv	http://purl.org/coar/resource_type/c_7a1f
dc.type.content.none.fl_str_mv	Text
dc.type.redcol.none.fl_str_mv	http://purl.org/redcol/resource_type/TP
format	http://purl.org/coar/resource_type/c_7a1f
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	https://hdl.handle.net/1992/75375
dc.identifier.instname.none.fl_str_mv	instname:Universidad de los Andes
dc.identifier.reponame.none.fl_str_mv	reponame:Repositorio Institucional Séneca
dc.identifier.repourl.none.fl_str_mv	repourl:https://repositorio.uniandes.edu.co/
url	https://hdl.handle.net/1992/75375
identifier_str_mv	instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv	eng
language	eng
dc.relation.references.none.fl_str_mv	Li, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163. Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886. Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12). Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741.
dc.rights.en.fl_str_mv	Attribution 4.0 International
dc.rights.uri.none.fl_str_mv	http://creativecommons.org/licenses/by/4.0/
dc.rights.accessrights.none.fl_str_mv	info:eu-repo/semantics/openAccess
dc.rights.coar.none.fl_str_mv	http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv	Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.extent.none.fl_str_mv	19 páginas
dc.format.mimetype.none.fl_str_mv	application/pdf
dc.publisher.none.fl_str_mv	Universidad de los Andes
dc.publisher.program.none.fl_str_mv	Ingeniería de Sistemas y Computación
dc.publisher.faculty.none.fl_str_mv	Facultad de Ingeniería
dc.publisher.department.none.fl_str_mv	Departamento de Ingeniería de Sistemas y Computación
publisher.none.fl_str_mv	Universidad de los Andes
institution	Universidad de los Andes
bitstream.url.fl_str_mv	https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/download https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/download https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/download
bitstream.checksum.fl_str_mv	8542378ea4c3dc4704cbe932fa18b327 516161ffe746f7910fbdf6126058b834 0175ea4a2d4caec4bbcc37e300941108 ae9e573a68e7f92501b6913cc846c39f 0e6d104a0a3990c4f2aa73fd6de07fac 7949596e30fc3d67bdc3e989ecd26bd5 cfba75d0ea345e948baf4a7ab31dc7ac f0fe413a349c591bc82cf63872ccc577
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5
repository.name.fl_str_mv	Repositorio institucional Séneca
repository.mail.fl_str_mv	adminrepositorio@uniandes.edu.co
_version_	1837004901648957440
spelling	Mastropaolo, AntonioEscobar Velasquez, Camilo Andresvirtual::22016-1Osorio Cálad, Juan José2025-01-13T20:09:27Z2025-01-13T20:09:27Z2025-01-11https://hdl.handle.net/1992/75375instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development.Pregrado19 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería de Sistemas y ComputaciónAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Bridging gaps in code generation with large language modelsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPBridging the Industry–Academia GapLarge Language ModelsModel EvaluationCode GenerationIngenieríaLi, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163.Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886.Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12).Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741.202021720Publication0000-0001-8414-93010000-0001-8414-9301virtual::22016-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe329190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-1ORIGINALautorizacion tesis-1.pdfautorizacion tesis-1.pdfHIDEapplication/pdf313373https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download8542378ea4c3dc4704cbe932fa18b327MD51Bridging Gaps in Code Generation with Large Language Models.pdfBridging Gaps in Code Generation with Large Language Models.pdfapplication/pdf232416https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download516161ffe746f7910fbdf6126058b834MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8908https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download0175ea4a2d4caec4bbcc37e300941108MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/downloadae9e573a68e7f92501b6913cc846c39fMD54TEXTautorizacion tesis-1.pdf.txtautorizacion tesis-1.pdf.txtExtracted texttext/plain2043https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download0e6d104a0a3990c4f2aa73fd6de07facMD55Bridging Gaps in Code Generation with Large Language Models.pdf.txtBridging Gaps in Code Generation with Large Language Models.pdf.txtExtracted texttext/plain46525https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download7949596e30fc3d67bdc3e989ecd26bd5MD57THUMBNAILautorizacion tesis-1.pdf.jpgautorizacion tesis-1.pdf.jpgGenerated Thumbnailimage/jpeg11036https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/downloadcfba75d0ea345e948baf4a7ab31dc7acMD56Bridging Gaps in Code Generation with Large Language Models.pdf.jpgBridging Gaps in Code Generation with Large Language Models.pdf.jpgGenerated Thumbnailimage/jpeg7770https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/downloadf0fe413a349c591bc82cf63872ccc577MD581992/75375oai:repositorio.uniandes.edu.co:1992/753752025-01-15 03:01:05.106http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K

Bridging gaps in code generation with large language models

Publicaciones similares