Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem

In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported...

Full description

Autores:
Osorio Salcedo, Juan Sebastián
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2023
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/74911
Acceso en línea:
https://hdl.handle.net/1992/74911
Palabra clave:
Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
Matemáticas
Rights
openAccess
License
Attribution-NonCommercial-ShareAlike 4.0 International
id UNIANDES2_ee0c14f658447548a5d38c2914988c03
oai_identifier_str oai:repositorio.uniandes.edu.co:1992/74911
network_acronym_str UNIANDES2
network_name_str Séneca: repositorio Uniandes
repository_id_str
dc.title.eng.fl_str_mv Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
title Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
spellingShingle Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
Matemáticas
title_short Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
title_full Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
title_fullStr Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
title_full_unstemmed Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
title_sort Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem
dc.creator.fl_str_mv Osorio Salcedo, Juan Sebastián
dc.contributor.advisor.none.fl_str_mv Quiroz Salazar, Adolfo José
dc.contributor.author.none.fl_str_mv Osorio Salcedo, Juan Sebastián
dc.contributor.jury.none.fl_str_mv González Barrios, José María
Giraldo Henao, Ramón
Hoegele, Michael Anton
dc.subject.keyword.eng.fl_str_mv Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
topic Wasserstein distance
Optimal transport
Sinkhorn divergence
Sinkhorn algorithm
Two-sample problem
Permutation test
Matemáticas
dc.subject.themes.spa.fl_str_mv Matemáticas
description In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported on finite discrete sets, associated to data-dependent partitions. We compare the statistics in simulated examples with the test proposed by Schilling. The performance of the tests considered is evaluated in terms of statistical power in different distributional settings and terms of computational efficiency. We prove a central limit theorem for the Sinkhorn divergence statistic in our main framework of data-dependent partitions under the null hypothesis, which depends only on the underlying distribution of the samples and the limit data-dependent partitions. The speed of convergence in the central limit theorem is evaluated under different conditions on the data and on the parameters that define the permutation statistic.
publishDate 2023
dc.date.issued.none.fl_str_mv 2023-01-13
dc.date.accessioned.none.fl_str_mv 2024-08-02T16:23:24Z
dc.date.available.none.fl_str_mv 2024-08-02T16:23:24Z
dc.type.none.fl_str_mv Trabajo de grado - Doctorado
dc.type.driver.none.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.version.none.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.none.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.content.none.fl_str_mv Text
dc.type.redcol.none.fl_str_mv https://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://hdl.handle.net/1992/74911
dc.identifier.doi.none.fl_str_mv 10.57784/1992/74911
dc.identifier.instname.none.fl_str_mv instname:Universidad de los Andes
dc.identifier.reponame.none.fl_str_mv reponame:Repositorio Institucional Séneca
dc.identifier.repourl.none.fl_str_mv repourl:https://repositorio.uniandes.edu.co/
url https://hdl.handle.net/1992/74911
identifier_str_mv 10.57784/1992/74911
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
repourl:https://repositorio.uniandes.edu.co/
dc.language.iso.none.fl_str_mv eng
language eng
dc.rights.en.fl_str_mv Attribution-NonCommercial-ShareAlike 4.0 International
dc.rights.uri.none.fl_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.rights.accessrights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution-NonCommercial-ShareAlike 4.0 International
http://creativecommons.org/licenses/by-nc-sa/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.extent.none.fl_str_mv 114 páginas
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv Universidad de los Andes
dc.publisher.program.none.fl_str_mv Doctorado en Matemáticas
dc.publisher.faculty.none.fl_str_mv Facultad de Ciencias
dc.publisher.department.none.fl_str_mv Departamento de Matemáticas
publisher.none.fl_str_mv Universidad de los Andes
institution Universidad de los Andes
bitstream.url.fl_str_mv https://repositorio.uniandes.edu.co/bitstreams/77844ba4-979e-44b3-a209-a0cdb55fa9d3/download
https://repositorio.uniandes.edu.co/bitstreams/3fc702ea-9945-4fc8-9eba-438048fd07d5/download
https://repositorio.uniandes.edu.co/bitstreams/f614ff7d-67a8-47e9-ac12-58fd000ca3be/download
https://repositorio.uniandes.edu.co/bitstreams/8067cbde-50eb-4f64-bd6b-19094ed36bf9/download
https://repositorio.uniandes.edu.co/bitstreams/bf55d217-1bba-4e8c-89ed-820b8244dcaa/download
https://repositorio.uniandes.edu.co/bitstreams/39157330-d458-4395-be16-3f8cc0e252a7/download
https://repositorio.uniandes.edu.co/bitstreams/3a5dd4aa-bcdb-4907-af04-6dec9a4fb139/download
https://repositorio.uniandes.edu.co/bitstreams/4726a87f-3121-4202-9855-a944e9a09665/download
bitstream.checksum.fl_str_mv 1037d84ac3037b69c500ccf6379ff5fb
80cf66256315ec8b136da4e309dbd202
ae9e573a68e7f92501b6913cc846c39f
934f4ca17e109e0a05eaeaba504d7ce4
f6d179e69d679584a2593a27634ef4b9
2d56ce7b72a479151181faac857277c6
0119d4e0ad4de0ce8cd590e98058c5c4
d6cfa48c6d97736472a9d6fda6eea82d
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio institucional Séneca
repository.mail.fl_str_mv adminrepositorio@uniandes.edu.co
_version_ 1812133900755927040
spelling Quiroz Salazar, Adolfo Josévirtual::19614-1Osorio Salcedo, Juan SebastiánGonzález Barrios, José MaríaGiraldo Henao, RamónHoegele, Michael Antonvirtual::19615-12024-08-02T16:23:24Z2024-08-02T16:23:24Z2023-01-13https://hdl.handle.net/1992/7491110.57784/1992/74911instname:Universidad de los Andesreponame:Repositorio Institucional Sénecarepourl:https://repositorio.uniandes.edu.co/In this thesis, we propose different ways to adapt the Wasserstein distance and the Sinkhorn divergence to the multivariate non-parametric two-sample problem when sample sizes are in the thousands, using permutation tests based on the Sinkhorn divergence between relative frequency vectors supported on finite discrete sets, associated to data-dependent partitions. We compare the statistics in simulated examples with the test proposed by Schilling. The performance of the tests considered is evaluated in terms of statistical power in different distributional settings and terms of computational efficiency. We prove a central limit theorem for the Sinkhorn divergence statistic in our main framework of data-dependent partitions under the null hypothesis, which depends only on the underlying distribution of the samples and the limit data-dependent partitions. The speed of convergence in the central limit theorem is evaluated under different conditions on the data and on the parameters that define the permutation statistic.Doctorado114 páginasapplication/pdfengUniversidad de los AndesDoctorado en MatemáticasFacultad de CienciasDepartamento de MatemáticasAttribution-NonCommercial-ShareAlike 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-sa/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2Permutation Test Based on the Sinkhorn Divergence For the Two-Sample ProblemTrabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttps://purl.org/redcol/resource_type/TDWasserstein distanceOptimal transportSinkhorn divergenceSinkhorn algorithmTwo-sample problemPermutation testMatemáticas200415782Publicationhttps://scholar.google.es/citations?user=qwMDh-4AAAAJvirtual::19614-10000-0003-4033-3400virtual::19614-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001497101virtual::19614-1https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001632250virtual::19615-11be19e5b-39c2-4d92-b44f-b9b4a48991cavirtual::19614-11be19e5b-39c2-4d92-b44f-b9b4a48991cavirtual::19614-1ec8a37d7-ebef-44bf-823c-e5eed39e7600virtual::19615-1ec8a37d7-ebef-44bf-823c-e5eed39e7600virtual::19615-1ORIGINALautorizacion tesis firma 1.pdfautorizacion tesis firma 1.pdfHIDEapplication/pdf368153https://repositorio.uniandes.edu.co/bitstreams/77844ba4-979e-44b3-a209-a0cdb55fa9d3/download1037d84ac3037b69c500ccf6379ff5fbMD51Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdfPermutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdfapplication/pdf1373135https://repositorio.uniandes.edu.co/bitstreams/3fc702ea-9945-4fc8-9eba-438048fd07d5/download80cf66256315ec8b136da4e309dbd202MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-82535https://repositorio.uniandes.edu.co/bitstreams/f614ff7d-67a8-47e9-ac12-58fd000ca3be/downloadae9e573a68e7f92501b6913cc846c39fMD54CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-81031https://repositorio.uniandes.edu.co/bitstreams/8067cbde-50eb-4f64-bd6b-19094ed36bf9/download934f4ca17e109e0a05eaeaba504d7ce4MD55TEXTautorizacion tesis firma 1.pdf.txtautorizacion tesis firma 1.pdf.txtExtracted texttext/plain2058https://repositorio.uniandes.edu.co/bitstreams/bf55d217-1bba-4e8c-89ed-820b8244dcaa/downloadf6d179e69d679584a2593a27634ef4b9MD56Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdf.txtPermutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdf.txtExtracted texttext/plain103057https://repositorio.uniandes.edu.co/bitstreams/39157330-d458-4395-be16-3f8cc0e252a7/download2d56ce7b72a479151181faac857277c6MD58THUMBNAILautorizacion tesis firma 1.pdf.jpgautorizacion tesis firma 1.pdf.jpgGenerated Thumbnailimage/jpeg10949https://repositorio.uniandes.edu.co/bitstreams/3a5dd4aa-bcdb-4907-af04-6dec9a4fb139/download0119d4e0ad4de0ce8cd590e98058c5c4MD57Permutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdf.jpgPermutation Test Based on the Sinkhorn Divergence For the Two-Sample Problem.pdf.jpgGenerated Thumbnailimage/jpeg6092https://repositorio.uniandes.edu.co/bitstreams/4726a87f-3121-4202-9855-a944e9a09665/downloadd6cfa48c6d97736472a9d6fda6eea82dMD591992/74911oai:repositorio.uniandes.edu.co:1992/749112024-09-12 15:52:42.077http://creativecommons.org/licenses/by-nc-sa/4.0/Attribution-NonCommercial-ShareAlike 4.0 Internationalopen.accesshttps://repositorio.uniandes.edu.coRepositorio institucional Sénecaadminrepositorio@uniandes.edu.coPGgzPjxzdHJvbmc+RGVzY2FyZ28gZGUgUmVzcG9uc2FiaWxpZGFkIC0gTGljZW5jaWEgZGUgQXV0b3JpemFjacOzbjwvc3Ryb25nPjwvaDM+CjxwPjxzdHJvbmc+UG9yIGZhdm9yIGxlZXIgYXRlbnRhbWVudGUgZXN0ZSBkb2N1bWVudG8gcXVlIHBlcm1pdGUgYWwgUmVwb3NpdG9yaW8gSW5zdGl0dWNpb25hbCBTw6luZWNhIHJlcHJvZHVjaXIgeSBkaXN0cmlidWlyIGxvcyByZWN1cnNvcyBkZSBpbmZvcm1hY2nDs24gZGVwb3NpdGFkb3MgbWVkaWFudGUgbGEgYXV0b3JpemFjacOzbiBkZSBsb3Mgc2lndWllbnRlcyB0w6lybWlub3M6PC9zdHJvbmc+PC9wPgo8cD5Db25jZWRhIGxhIGxpY2VuY2lhIGRlIGRlcMOzc2l0byBlc3TDoW5kYXIgc2VsZWNjaW9uYW5kbyBsYSBvcGNpw7NuIDxzdHJvbmc+J0FjZXB0YXIgbG9zIHTDqXJtaW5vcyBhbnRlcmlvcm1lbnRlIGRlc2NyaXRvcyc8L3N0cm9uZz4geSBjb250aW51YXIgZWwgcHJvY2VzbyBkZSBlbnbDrW8gbWVkaWFudGUgZWwgYm90w7NuIDxzdHJvbmc+J1NpZ3VpZW50ZScuPC9zdHJvbmc+PC9wPgo8aHI+CjxwPllvLCBlbiBtaSBjYWxpZGFkIGRlIGF1dG9yIGRlbCB0cmFiYWpvIGRlIHRlc2lzLCBtb25vZ3JhZsOtYSBvIHRyYWJham8gZGUgZ3JhZG8sIGhhZ28gZW50cmVnYSBkZWwgZWplbXBsYXIgcmVzcGVjdGl2byB5IGRlIHN1cyBhbmV4b3MgZGUgc2VyIGVsIGNhc28sIGVuIGZvcm1hdG8gZGlnaXRhbCB5L28gZWxlY3Ryw7NuaWNvIHkgYXV0b3Jpem8gYSBsYSBVbml2ZXJzaWRhZCBkZSBsb3MgQW5kZXMgcGFyYSBxdWUgcmVhbGljZSBsYSBwdWJsaWNhY2nDs24gZW4gZWwgU2lzdGVtYSBkZSBCaWJsaW90ZWNhcyBvIGVuIGN1YWxxdWllciBvdHJvIHNpc3RlbWEgbyBiYXNlIGRlIGRhdG9zIHByb3BpbyBvIGFqZW5vIGEgbGEgVW5pdmVyc2lkYWQgeSBwYXJhIHF1ZSBlbiBsb3MgdMOpcm1pbm9zIGVzdGFibGVjaWRvcyBlbiBsYSBMZXkgMjMgZGUgMTk4MiwgTGV5IDQ0IGRlIDE5OTMsIERlY2lzacOzbiBBbmRpbmEgMzUxIGRlIDE5OTMsIERlY3JldG8gNDYwIGRlIDE5OTUgeSBkZW3DoXMgbm9ybWFzIGdlbmVyYWxlcyBzb2JyZSBsYSBtYXRlcmlhLCB1dGlsaWNlIGVuIHRvZGFzIHN1cyBmb3JtYXMsIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlIHJlcHJvZHVjY2nDs24sIGNvbXVuaWNhY2nDs24gcMO6YmxpY2EsIHRyYW5zZm9ybWFjacOzbiB5IGRpc3RyaWJ1Y2nDs24gKGFscXVpbGVyLCBwcsOpc3RhbW8gcMO6YmxpY28gZSBpbXBvcnRhY2nDs24pIHF1ZSBtZSBjb3JyZXNwb25kZW4gY29tbyBjcmVhZG9yIGRlIGxhIG9icmEgb2JqZXRvIGRlbCBwcmVzZW50ZSBkb2N1bWVudG8uPC9wPgo8cD5MYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGVtaXRlIGVuIGNhbGlkYWQgZGUgYXV0b3IgZGUgbGEgb2JyYSBvYmpldG8gZGVsIHByZXNlbnRlIGRvY3VtZW50byB5IG5vIGNvcnJlc3BvbmRlIGEgY2VzacOzbiBkZSBkZXJlY2hvcywgc2lubyBhIGxhIGF1dG9yaXphY2nDs24gZGUgdXNvIGFjYWTDqW1pY28gZGUgY29uZm9ybWlkYWQgY29uIGxvIGFudGVyaW9ybWVudGUgc2XDsWFsYWRvLiBMYSBwcmVzZW50ZSBhdXRvcml6YWNpw7NuIHNlIGhhY2UgZXh0ZW5zaXZhIG5vIHNvbG8gYSBsYXMgZmFjdWx0YWRlcyB5IGRlcmVjaG9zIGRlIHVzbyBzb2JyZSBsYSBvYnJhIGVuIGZvcm1hdG8gbyBzb3BvcnRlIG1hdGVyaWFsLCBzaW5vIHRhbWJpw6luIHBhcmEgZm9ybWF0byBlbGVjdHLDs25pY28sIHkgZW4gZ2VuZXJhbCBwYXJhIGN1YWxxdWllciBmb3JtYXRvIGNvbm9jaWRvIG8gcG9yIGNvbm9jZXIuPC9wPgo8cD5FbCBhdXRvciwgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBvYmpldG8gZGUgbGEgcHJlc2VudGUgYXV0b3JpemFjacOzbiBlcyBvcmlnaW5hbCB5IGxhIHJlYWxpesOzIHNpbiB2aW9sYXIgbyB1c3VycGFyIGRlcmVjaG9zIGRlIGF1dG9yIGRlIHRlcmNlcm9zLCBwb3IgbG8gdGFudG8sIGxhIG9icmEgZXMgZGUgc3UgZXhjbHVzaXZhIGF1dG9yw61hIHkgdGllbmUgbGEgdGl0dWxhcmlkYWQgc29icmUgbGEgbWlzbWEuPC9wPgo8cD5FbiBjYXNvIGRlIHByZXNlbnRhcnNlIGN1YWxxdWllciByZWNsYW1hY2nDs24gbyBhY2Npw7NuIHBvciBwYXJ0ZSBkZSB1biB0ZXJjZXJvIGVuIGN1YW50byBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBzb2JyZSBsYSBvYnJhIGVuIGN1ZXN0acOzbiwgZWwgYXV0b3IgYXN1bWlyw6EgdG9kYSBsYSByZXNwb25zYWJpbGlkYWQsIHkgc2FsZHLDoSBkZSBkZWZlbnNhIGRlIGxvcyBkZXJlY2hvcyBhcXXDrSBhdXRvcml6YWRvcywgcGFyYSB0b2RvcyBsb3MgZWZlY3RvcyBsYSBVbml2ZXJzaWRhZCBhY3TDumEgY29tbyB1biB0ZXJjZXJvIGRlIGJ1ZW5hIGZlLjwvcD4KPHA+U2kgdGllbmUgYWxndW5hIGR1ZGEgc29icmUgbGEgbGljZW5jaWEsIHBvciBmYXZvciwgY29udGFjdGUgY29uIGVsIDxhIGhyZWY9Im1haWx0bzpiaWJsaW90ZWNhQHVuaWFuZGVzLmVkdS5jbyIgdGFyZ2V0PSJfYmxhbmsiPkFkbWluaXN0cmFkb3IgZGVsIFNpc3RlbWEuPC9hPjwvcD4K