Bridging gaps in code generation with large language models

Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-wo...

Full description

Autores:
Osorio Cálad, Juan José
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2025
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/75375
Acceso en línea:
https://hdl.handle.net/1992/75375
Palabra clave:
Bridging the Industry–Academia Gap
Large Language Models
Model Evaluation
Code Generation
Ingeniería
Rights
openAccess
License
Attribution 4.0 International
id UNIANDES2_0f20d45b8b9ad97d0163a65a4ea1185f
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/75375
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.eng.fl_str_mv Bridging gaps in code generation with large language models
title Bridging gaps in code generation with large language models
spellingShingle Bridging gaps in code generation with large language models
Bridging the Industry–Academia Gap
Large Language Models
Model Evaluation
Code Generation
Ingeniería
title_short Bridging gaps in code generation with large language models
title_full Bridging gaps in code generation with large language models
title_fullStr Bridging gaps in code generation with large language models
title_full_unstemmed Bridging gaps in code generation with large language models
title_sort Bridging gaps in code generation with large language models
dc.creator.fl_str_mv Osorio Cálad, Juan José
dc.contributor.advisor.none.fl_str_mv Mastropaolo, Antonio
Escobar Velasquez, Camilo Andres
dc.contributor.author.none.fl_str_mv Osorio Cálad, Juan José
dc.subject.keyword.eng.fl_str_mv Bridging the Industry–Academia Gap
Large Language Models
Model Evaluation
Code Generation
topic Bridging the Industry–Academia Gap
Large Language Models
Model Evaluation
Code Generation
Ingeniería
dc.subject.themes.none.fl_str_mv Ingeniería
description Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development.
publishDate 2025
dc.date.accessioned.none.fl_str_mv 2025-01-13T20:09:27Z
dc.date.available.none.fl_str_mv 2025-01-13T20:09:27Z
dc.date.issued.none.fl_str_mv 2025-01-11
dc.type.none.fl_str_mv Trabajo de grado - Pregrado
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/bachelorThesis
dc.type.version.none.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_7a1f
dc.type.content.none.fl_str_mv Text
dc.type.redcol.none.fl_str_mv http://purl.org/redcol/resource_type/TP
format http://purl.org/coar/resource_type/c_7a1f
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/1992/75375
dc.identifier.instname.none.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.none.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.none.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url https://hdl.handle.net/1992/75375
identifier_str_mv instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.references.none.fl_str_mv Li, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163.
Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886.
Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12).
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741.
dc.rights.en.fl_str_mv Attribution 4.0 International
dc.rights.uri.none.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.accessrights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution 4.0 International
http://creativecommons.org/licenses/by/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.none.fl_str_mv 19 páginas
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de los Andes
dc.publisher.program.none.fl_str_mv Ingeniería de Sistemas y Computación
dc.publisher.faculty.none.fl_str_mv Facultad de Ingeniería
dc.publisher.department.none.fl_str_mv Departamento de Ingeniería de Sistemas y Computación
publisher.none.fl_str_mv Universidad de los Andes
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download
https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download
https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download
https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/download
https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download
https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download
https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/download
https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/download
bitstream.checksum.fl_str_mv 8542378ea4c3dc4704cbe932fa18b327
516161ffe746f7910fbdf6126058b834
0175ea4a2d4caec4bbcc37e300941108
ae9e573a68e7f92501b6913cc846c39f
0e6d104a0a3990c4f2aa73fd6de07fac
7949596e30fc3d67bdc3e989ecd26bd5
cfba75d0ea345e948baf4a7ab31dc7ac
f0fe413a349c591bc82cf63872ccc577
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1828159153048125440
spelling Mastropaolo, AntonioEscobar Velasquez, Camilo Andresvirtual::22016-1Osorio Cálad, Juan José2025-01-13T20:09:27Z2025-01-13T20:09:27Z2025-01-11https://hdl.handle.net/1992/75375instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development.Pregrado19 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería de Sistemas y ComputaciónAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Bridging gaps in code generation with large language modelsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPBridging the Industry–Academia GapLarge Language ModelsModel EvaluationCode GenerationIngenieríaLi, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163.Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886.Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12).Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741.202021720Publication0000-0001-8414-93010000-0001-8414-9301virtual::22016-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe329190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-1ORIGINALautorizacion tesis-1.pdfautorizacion tesis-1.pdfHIDEapplication/pdf313373https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download8542378ea4c3dc4704cbe932fa18b327MD51Bridging Gaps in Code Generation with Large Language Models.pdfBridging Gaps in Code Generation with Large Language Models.pdfapplication/pdf232416https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download516161ffe746f7910fbdf6126058b834MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8908https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download0175ea4a2d4caec4bbcc37e300941108MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/downloadae9e573a68e7f92501b6913cc846c39fMD54TEXTautorizacion tesis-1.pdf.txtautorizacion tesis-1.pdf.txtExtracted texttext/plain2043https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download0e6d104a0a3990c4f2aa73fd6de07facMD55Bridging Gaps in Code Generation with Large Language Models.pdf.txtBridging Gaps in Code Generation with Large Language Models.pdf.txtExtracted texttext/plain46525https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download7949596e30fc3d67bdc3e989ecd26bd5MD57THUMBNAILautorizacion tesis-1.pdf.jpgautorizacion tesis-1.pdf.jpgGenerated Thumbnailimage/jpeg11036https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/downloadcfba75d0ea345e948baf4a7ab31dc7acMD56Bridging Gaps in Code Generation with Large Language Models.pdf.jpgBridging Gaps in Code Generation with Large Language Models.pdf.jpgGenerated Thumbnailimage/jpeg7770https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/downloadf0fe413a349c591bc82cf63872ccc577MD581992/75375oai:repositorio.uniandes.edu.co:1992/753752025-01-15 03:01:05.106http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K