Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem

In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported...

Full description

Autores:
Osorio Salcedo, Juan Sebastián
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2023
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/74911
Acceso en línea:
https://hdl.handle.net/1992/74911
Palabra clave:
Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
Matemáticas
Rights
openAccess
License
Attribution-NonCommercial-ShareAlike 4.0 International
Description
Summary:In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported on finite discrete sets, associated to data-dependent partitions. We compare the statistics in simulated examples with the test proposed by Schilling. The performance of the tests considered is evaluated in terms of statistical power in different distributional settings and terms of computational efficiency. We prove a central limit theorem for the Sinkhorn divergence statistic in our main framework of data-dependent partitions under the null hypothesis, which depends only on the underlying distribution of the samples and the limit data-dependent partitions. The speed of convergence in the central limit theorem is evaluated under different conditions on the data and on the parameters that define the permutation statistic.