A derivation of the optimal answer-copying index and some applications

Multiple choice exams are frequently used as an efficient and objective instrument to evaluate knowledge. Nevertheless, they are more vulnerable to answer-copying than tests based on open questions. Several statistical tests (known as indices) have been proposed to detect cheating but to the best of...

Full description

Autores:
Romero, Mauricio
Jara Pinzón, Diego
Riascos Villegas, Álvaro José
Tipo de recurso:
Work document
Fecha de publicación:
2014
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
spa
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/8507
Acceso en línea:
http://hdl.handle.net/1992/8507
Palabra clave:
Answer copying
False discovery rate
Index
Neyman-Pearson Lemma
Mediciones y pruebas educativas
Pruebas de conocimiento
C19, I20
Rights
openAccess
License
http://creativecommons.org/licenses/by-nc-nd/4.0/
Description
Summary:Multiple choice exams are frequently used as an efficient and objective instrument to evaluate knowledge. Nevertheless, they are more vulnerable to answer-copying than tests based on open questions. Several statistical tests (known as indices) have been proposed to detect cheating but to the best of our knowledge they all lack a mathematical support that guarantees optimality in any sense. This work aims at filling this void by deriving the uniform most powerful (UMP) test assuming the response distribution is known. In practice we must estimate a behavioral model that yields a response distribution for each question. We calculate the empirical type-I and type-II error rates for several indices, that assume different behavioral models, using simulations based on real data from twelve nation wide multiple choice exams taken by 5th and 9th graders in Colombia. We find that the index with the highest power among those studied, subject to the restriction of preserving the type-I error, is the one that uses a nominal response model for item answering, conditions on the answers of the individual suspected of being the source of copy and calculates critical values via a normal approximation. This index was first studied by Wollack (1997) and later by W. Van der Linden and Sotaridona (2006) and is superior to the indices studied and developed by Wesolowsky (2000) and Frary, Tideman, and Watts (1977). Furthermore, we compare the performance of the indices on examination rooms with different levels of proctoring and find that increasing the level of proctoring can reduce copying by as much as 50% and that simple strategies such as having different students answer different portions of the test at different times canal so reduce cheating by over 50%. Finally, a Bonferroni type false discovery rate procedure is used to detect massive cheating. The application is straightforward and we believe it could be use to make entire examination rooms retake an exam under stricter surveillance conditions.