Bridging gaps in code generation with large language models
Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-wo...
- Autores:
-
Osorio Cálad, Juan José
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2025
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/75375
- Acceso en línea:
- https://hdl.handle.net/1992/75375
- Palabra clave:
- Bridging the Industry–Academia Gap
Large Language Models
Model Evaluation
Code Generation
Ingeniería
- Rights
- openAccess
- License
- Attribution 4.0 International
id |
UNIANDES2_0f20d45b8b9ad97d0163a65a4ea1185f |
---|---|
oai_identifier_str |
oai:repositorio.uniandes.edu.co:1992/75375 |
network_acronym_str |
UNIANDES2 |
network_name_str |
Séneca: repositorio Uniandes |
repository_id_str |
|
dc.title.eng.fl_str_mv |
Bridging gaps in code generation with large language models |
title |
Bridging gaps in code generation with large language models |
spellingShingle |
Bridging gaps in code generation with large language models Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation Ingeniería |
title_short |
Bridging gaps in code generation with large language models |
title_full |
Bridging gaps in code generation with large language models |
title_fullStr |
Bridging gaps in code generation with large language models |
title_full_unstemmed |
Bridging gaps in code generation with large language models |
title_sort |
Bridging gaps in code generation with large language models |
dc.creator.fl_str_mv |
Osorio Cálad, Juan José |
dc.contributor.advisor.none.fl_str_mv |
Mastropaolo, Antonio Escobar Velasquez, Camilo Andres |
dc.contributor.author.none.fl_str_mv |
Osorio Cálad, Juan José |
dc.subject.keyword.eng.fl_str_mv |
Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation |
topic |
Bridging the Industry–Academia Gap Large Language Models Model Evaluation Code Generation Ingeniería |
dc.subject.themes.none.fl_str_mv |
Ingeniería |
description |
Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development. |
publishDate |
2025 |
dc.date.accessioned.none.fl_str_mv |
2025-01-13T20:09:27Z |
dc.date.available.none.fl_str_mv |
2025-01-13T20:09:27Z |
dc.date.issued.none.fl_str_mv |
2025-01-11 |
dc.type.none.fl_str_mv |
Trabajo de grado - Pregrado |
dc.type.driver.none.fl_str_mv |
info:eu-repo/semantics/bachelorThesis |
dc.type.version.none.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.none.fl_str_mv |
http://purl.org/coar/resource_type/c_7a1f |
dc.type.content.none.fl_str_mv |
Text |
dc.type.redcol.none.fl_str_mv |
http://purl.org/redcol/resource_type/TP |
format |
http://purl.org/coar/resource_type/c_7a1f |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://hdl.handle.net/1992/75375 |
dc.identifier.instname.none.fl_str_mv |
instname:Universidad de los Andes |
dc.identifier.reponame.none.fl_str_mv |
reponame:Repositorio Institucional Séneca |
dc.identifier.repourl.none.fl_str_mv |
repourl:https://repositorio.uniandes.edu.co/ |
url |
https://hdl.handle.net/1992/75375 |
identifier_str_mv |
instname:Universidad de los Andes reponame:Repositorio Institucional Séneca repourl:https://repositorio.uniandes.edu.co/ |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.references.none.fl_str_mv |
Li, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163. Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886. Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12). Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374. Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741. |
dc.rights.en.fl_str_mv |
Attribution 4.0 International |
dc.rights.uri.none.fl_str_mv |
http://creativecommons.org/licenses/by/4.0/ |
dc.rights.accessrights.none.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.none.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.extent.none.fl_str_mv |
19 páginas |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.none.fl_str_mv |
Universidad de los Andes |
dc.publisher.program.none.fl_str_mv |
Ingeniería de Sistemas y Computación |
dc.publisher.faculty.none.fl_str_mv |
Facultad de Ingeniería |
dc.publisher.department.none.fl_str_mv |
Departamento de Ingeniería de Sistemas y Computación |
publisher.none.fl_str_mv |
Universidad de los Andes |
institution |
Universidad de los Andes |
bitstream.url.fl_str_mv |
https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/download https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/download https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/download |
bitstream.checksum.fl_str_mv |
8542378ea4c3dc4704cbe932fa18b327 516161ffe746f7910fbdf6126058b834 0175ea4a2d4caec4bbcc37e300941108 ae9e573a68e7f92501b6913cc846c39f 0e6d104a0a3990c4f2aa73fd6de07fac 7949596e30fc3d67bdc3e989ecd26bd5 cfba75d0ea345e948baf4a7ab31dc7ac f0fe413a349c591bc82cf63872ccc577 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio institucional Séneca |
repository.mail.fl_str_mv |
adminrepositorio@uniandes.edu.co |
_version_ |
1828159153048125440 |
spelling |
Mastropaolo, AntonioEscobar Velasquez, Camilo Andresvirtual::22016-1Osorio Cálad, Juan José2025-01-13T20:09:27Z2025-01-13T20:09:27Z2025-01-11https://hdl.handle.net/1992/75375instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/Large Language Models (LLMs) are transforming natural language processing and extending their impact to code generation. This thesis evaluates both academic and industrial LLMs, focusing on their ability to generate pragmatic, functional code for non-standalone functions—a critical aspect of real-world programming. To address gaps in performance, reproducibility, and applicability, this research introduces two key tools: the CoderEval-Prompt-Inference repository for structured evaluation and the huggingface_search repository to overcome API limitations in model discovery. The evaluation framework leverages curated datasets to assess the correctness and utility of LLM outputs, emphasizing reproducible workflows and context-aware code generation. Challenges addressed include cleaning model outputs and ensuring their functionality within real-world constraints. Results highlight significant disparities between academic and industry models, providing insights into their alignment for practical use cases. By integrating GPU-based testing for scalability, this work establishes a robust pipeline for evaluating and deploying LLMs in software engineering. This research contributes to bridging the gap between academic innovation and industry application by enhancing model discovery, standardizing evaluation methods, and fostering collaboration across domains. Future efforts will focus on refining tools and methodologies to further unlock the potential of LLMs in real-world software development.Pregrado19 páginasapplication/pdfengUniversidad de los AndesIngeniería de Sistemas y ComputaciónFacultad de IngenieríaDepartamento de Ingeniería de Sistemas y ComputaciónAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Bridging gaps in code generation with large language modelsTrabajo de grado - Pregradoinfo:eu-repo/semantics/bachelorThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_7a1fTexthttp://purl.org/redcol/resource_type/TPBridging the Industry–Academia GapLarge Language ModelsModel EvaluationCode GenerationIngenieríaLi, L., Dinh, L., Hu, S., & Hemphill, L. (2024). Academic collaboration on large language model studies increases overall but varies across disciplines. arXiv preprint arXiv:2408.04163.Ahmed, N., Wahed, M., & Thompson, N. C. (2023). The growing influence of industry in AI research. Science, 379(6635), 884-886.Yu, H., Shen, B., Ran, D., Zhang, J., Zhang, Q., Ma, Y., ... & Xie, T. (2024, February). Codereval: A benchmark of pragmatic code generation with generative pre-trained models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (pp. 1-12).Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., ... & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the bleu: how should we assess quality of the code generation models?. Journal of Systems and Software, 203, 111741.202021720Publication0000-0001-8414-93010000-0001-8414-9301virtual::22016-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=00016890402019118830virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-190cbaa5a-48e9-458f-949b-4a661bbe329190cbaa5a-48e9-458f-949b-4a661bbe3291virtual::22016-1ORIGINALautorizacion tesis-1.pdfautorizacion tesis-1.pdfHIDEapplication/pdf313373https://repositorio.uniandes.edu.co/bitstreams/de23be9c-5327-4f7e-92e9-eae5b7bc7449/download8542378ea4c3dc4704cbe932fa18b327MD51Bridging Gaps in Code Generation with Large Language Models.pdfBridging Gaps in Code Generation with Large Language Models.pdfapplication/pdf232416https://repositorio.uniandes.edu.co/bitstreams/baf3a43d-7907-4bd5-9581-891e0e7a9d79/download516161ffe746f7910fbdf6126058b834MD52CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8908https://repositorio.uniandes.edu.co/bitstreams/a77d1182-5a23-44c2-8748-12ab66b19aa5/download0175ea4a2d4caec4bbcc37e300941108MD53LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/18d3fb8d-d513-4321-b58c-e69d2a8d6911/downloadae9e573a68e7f92501b6913cc846c39fMD54TEXTautorizacion tesis-1.pdf.txtautorizacion tesis-1.pdf.txtExtracted texttext/plain2043https://repositorio.uniandes.edu.co/bitstreams/55b6b419-ab22-47ba-88fa-b272ff5f41c7/download0e6d104a0a3990c4f2aa73fd6de07facMD55Bridging Gaps in Code Generation with Large Language Models.pdf.txtBridging Gaps in Code Generation with Large Language Models.pdf.txtExtracted texttext/plain46525https://repositorio.uniandes.edu.co/bitstreams/a035df64-5f6b-4ee7-9ab9-1b5beaa1323d/download7949596e30fc3d67bdc3e989ecd26bd5MD57THUMBNAILautorizacion tesis-1.pdf.jpgautorizacion tesis-1.pdf.jpgGenerated Thumbnailimage/jpeg11036https://repositorio.uniandes.edu.co/bitstreams/2852dbeb-6a10-42de-a6ec-40ffa775de0f/downloadcfba75d0ea345e948baf4a7ab31dc7acMD56Bridging Gaps in Code Generation with Large Language Models.pdf.jpgBridging Gaps in Code Generation with Large Language Models.pdf.jpgGenerated Thumbnailimage/jpeg7770https://repositorio.uniandes.edu.co/bitstreams/ab89975a-3bc6-4dba-aeb4-4e2756bd1cc3/downloadf0fe413a349c591bc82cf63872ccc577MD581992/75375oai:repositorio.uniandes.edu.co:1992/753752025-01-15 03:01:05.106http://creativecommons.org/licenses/by/4.0/Attribution 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K |