Methodology for calculating critical values of relevance measures in variable selection methods in data envelopment analysis

The selection of input and output variables is a key step in evaluating the relative efficiency of decision- making units (DMUs) in data envelopment analysis (DEA). In this paper, we present a methodology based on Monte Carlo simulations and bootstrapping for calculating the critical values of relev...

Full description

Autores:
Villanueva-Cantillo, Jeyms
Munoz-Marquez, Manuel
Tipo de recurso:
Fecha de publicación:
2021
Institución:
Universidad Simón Bolívar
Repositorio:
Repositorio Digital USB
Idioma:
eng
OAI Identifier:
oai:bonga.unisimon.edu.co:20.500.12442/8028
Acceso en línea:
https://hdl.handle.net/20.500.12442/8028
https://doi.org/10.1016/j.ejor.2020.08.021
https://www.sciencedirect.com/science/article/pii/S0377221720307293
Palabra clave:
Data envelopment analysis
Variable selection
Critical values
Monte Carlo simulations
Rights
openAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Description
Summary:The selection of input and output variables is a key step in evaluating the relative efficiency of decision- making units (DMUs) in data envelopment analysis (DEA). In this paper, we present a methodology based on Monte Carlo simulations and bootstrapping for calculating the critical values of relevance measures in variable selection methods in DEA. Additionally, we define a set of metrics to study the methods’ performance when using such critical values. We conducted an extensive simulation study, applying the proposed methodology to two variable selection methods in 28 single-output model specifications (i.e., different number of inputs and DMUs in the DEA model) under multiple scenarios, varying factors related to the functional form of the production function, the probability of an input being relevant in the model, the probability distribution of the inputs, and the theoretical efficiencies of the DMUs. The simulation study shows that (i) our proposed methodology yields consistent results for the two methods studied, in terms of the generated critical values and the performance metrics, and (ii) for most model specifications, the critical values can be estimated with a linear model with a high adjusted R 2 , using factors related to the input probability distribution and the probability of an input being relevant as independent variables. Furthermore, we describe and compare the performance of the two methods studied, provide guidelines for using our methodology and the results presented in this paper, and propose suggestions for future research.