Source code analysis on student assignments using machine learning techniques

Abstract. To increase the success in computer programming courses, it is important to understand the learning process and common difficulties faced by students. Although several studies have investigated possible relationships between students performance and self-regulated learning characteristics,...

Full description

Autores:
Castellanos Morales, Hugo Armando
Tipo de recurso:
Fecha de publicación:
2017
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/60068
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/60068
http://bdigital.unal.edu.co/58004/
Palabra clave:
0 Generalidades / Computer science, information and general works
37 Educación / Education
Motivation
Learning strategies
Machine learning
Source code analysis
Self-regulation
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Abstract. To increase the success in computer programming courses, it is important to understand the learning process and common difficulties faced by students. Although several studies have investigated possible relationships between students performance and self-regulated learning characteristics, little attention has been given the source code produced by students in this regard. Such source code might contain valuable information about their learning process, specially in a context where practical programming assignments are frequent and students write source code constantly during the course. This poses the following research questions: What is the relationship between the characteristics of students source code and their performance in a computer programming course?. What is the relationship between source code features and self-regulated learning characteristics (i.e., motivation and learning strategies) in a computer programming course?. How the source code and self-regulated features can predict the students' performance? In order to answer these questions, a strategy to support the correlation analysis among students performance, motivation, use of learning strategies, and source code metrics in computer programming courses is proposed. A comprehensive case study is presented to evaluate the strategy. Additionally, an automatic grading tool for programming assignments was used, which facilitated to obtain the source code of the participants for further automatic source code analysis. Moreover, self-regulated learning characteristics were collected using the Motivated Strategies for Learning Questionnaire (MSLQ). Results show that the main features from source code which are significantly related to students performance and self-regulated learning features are: length-related metrics, with mainly positive correlations; and Halstead complexity measures, correlated negatively. In the light of the findings of this study, it is possible to understand better students source code as an artifact that can be used to monitorize several characteristics related to self-regulated learning, course performance, and in general, their learning process. In this way, more research in the area is required to verify if these relationships could give to computing educators new ways to identify and help students with problems.