Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported...
- Autores:
-
Osorio Salcedo, Juan Sebastián
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2023
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/74911
- Acceso en línea:
- https://hdl.handle.net/1992/74911
- Palabra clave:
- Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
Matemáticas
- Rights
- openAccess
- License
- Attribution-NonCommercial-ShareAlike 4.0 International
Summary: | In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported on finite discrete sets, associated to data-dependent partitions. We compare the statistics in simulated examples with the test proposed by Schilling. The performance of the tests considered is evaluated in terms of statistical power in different distributional settings and terms of computational efficiency. We prove a central limit theorem for the Sinkhorn divergence statistic in our main framework of data-dependent partitions under the null hypothesis, which depends only on the underlying distribution of the samples and the limit data-dependent partitions. The speed of convergence in the central limit theorem is evaluated under different conditions on the data and on the parameters that define the permutation statistic. |
---|