Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key...

Full description

Autores:: Cortés Muñoz, Fabián

Tipo de recurso:: Doctoral thesis

Fecha de publicación:: 2019

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

id	UNACIONAL2_7327a784c1c66a4be304617cd89873a1
oai_identifier_str	oai:repositorio.unal.edu.co:unal/77356
network_acronym_str	UNACIONAL2
network_name_str	Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
spellingShingle	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística
title_short	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_full	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_fullStr	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_full_unstemmed	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_sort	Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
dc.creator.fl_str_mv	Cortés Muñoz, Fabián
dc.contributor.author.spa.fl_str_mv	Cortés Muñoz, Fabián
dc.contributor.spa.fl_str_mv	López Kleine, Liliana
dc.subject.proposal.spa.fl_str_mv	Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística
topic	Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística
description	Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.
publishDate	2019
dc.date.issued.spa.fl_str_mv	2019
dc.date.accessioned.spa.fl_str_mv	2020-03-30T06:47:49Z
dc.date.available.spa.fl_str_mv	2020-03-30T06:47:49Z
dc.type.spa.fl_str_mv	Trabajo de grado - Doctorado
dc.type.driver.spa.fl_str_mv	info:eu-repo/semantics/doctoralThesis
dc.type.version.spa.fl_str_mv	info:eu-repo/semantics/acceptedVersion
dc.type.coar.spa.fl_str_mv	http://purl.org/coar/resource_type/c_db06
dc.type.content.spa.fl_str_mv	Text
dc.type.redcol.spa.fl_str_mv	http://purl.org/redcol/resource_type/TD
format	http://purl.org/coar/resource_type/c_db06
status_str	acceptedVersion
dc.identifier.uri.none.fl_str_mv	https://repositorio.unal.edu.co/handle/unal/77356
dc.identifier.eprints.spa.fl_str_mv	http://bdigital.unal.edu.co/75041/
url	https://repositorio.unal.edu.co/handle/unal/77356 http://bdigital.unal.edu.co/75041/
dc.language.iso.spa.fl_str_mv	spa
language	spa
dc.relation.ispartof.spa.fl_str_mv	Universidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de Estadística Departamento de Estadística
dc.relation.haspart.spa.fl_str_mv	31 Colecciones de estadística general / Statistics
dc.relation.references.spa.fl_str_mv	Cortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv	Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv	http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv	Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv	info:eu-repo/semantics/openAccess
rights_invalid_str_mv	Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv	openAccess
dc.format.mimetype.spa.fl_str_mv	application/pdf
institution	Universidad Nacional de Colombia
bitstream.url.fl_str_mv	https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg
bitstream.checksum.fl_str_mv	3e44d6f10df2d3f39b797c3983dc437a 595bc0f299371c7a4ed679cdcae7c3fd
bitstream.checksumAlgorithm.fl_str_mv	MD5 MD5
repository.name.fl_str_mv	Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv	repositorio_nal@unal.edu.co
_version_	1814089317267537920
spelling	Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2López Kleine, LilianaCortés Muñoz, Fabián761f3bda-d5e5-4157-8c40-b6156a3194833002020-03-30T06:47:49Z2020-03-30T06:47:49Z2019https://repositorio.unal.edu.co/handle/unal/77356http://bdigital.unal.edu.co/75041/Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.En varios contextos de análisis de datos genómicos, una gran cantidad de hipótesis estadísticas se ponen a prueba al mismo tiempo sobre un mismo conjunto de datos. En Estudios de asociación del genoma completo (GWAS, por sus siglas en inglés), se evalúa la asociación entre fenotipos categóricos (como por ejemplo, sanos versus enfermos) y la presencia o ausencia de polimorfismos de nucleótido simples (SNPs, por sus siglas en inglés) mediante pruebas estadísticas. En éstos tipos de estudios, dos problemas principales se deben abordar: qué método usar para la corrección por pruebas múltiples y cómo aumentar la potencia estadística después de realizar ajustes al error tipo I. La obtención de un criterio sólido para declarar asociaciones significativas con alto poder estadístico sin aumentar el tamaño de la muestra es crucial para estos estudios de asociación. Además los métodos propuestos deben ser fáciles de usar y comprender para los científicos que producen los datos. Diversos métodos han sido desarrollados a fin de abordar tales limitaciones con importantes resultados en la disminución de los errores de tipo I y tipo II. Los métodos propuestos se basan principalmente en cambiar el tipo de prueba para establecer la asociación y extenderla a fenotipos continuos. Algunos de ellos son métodos muy complejos y son difíciles de usar para los no estadísticos, quienes son generalmente los que producen los datos. Además, muy pocos métodos se han centrado en desarrollar una nueva prueba estadística para datos categóricos, que es la forma más común de medir rasgos fenotípicos en humanos y otros organismos. Una nueva prueba estadística, llamada Cociente C, se ha propuesto y permite evaluar asociaciones entre una gran cantidad de SNPs y un rasgo categórico, utilizando un estadística de prueba basado en los valores máximos de variables aleatorias χ 2 . Esta nueva metodología ha demostrado en datos reales, encontrar un número mayor de polimorfismos asociados con el fenotipo y poseer mayor potencia estadística en comparación con las propuestas clásicas de corrección por pruebas múltiples.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de EstadísticaDepartamento de Estadística31 Colecciones de estadística general / StatisticsCortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)Trabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDGenome-wide association studiesMultiple testing problemcategorical data dataType I errorStatistical powerEstudios de asociación del genoma completoProblemas de comparaciones múltiplesDatos categóricosError tipo IPotencia estadísticaORIGINALMethodology for estimating associatio between.pdfapplication/pdf1261551https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf3e44d6f10df2d3f39b797c3983dc437aMD51THUMBNAILMethodology for estimating associatio between.pdf.jpgMethodology for estimating associatio between.pdf.jpgGenerated Thumbnailimage/jpeg2580https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg595bc0f299371c7a4ed679cdcae7c3fdMD52unal/77356oai:repositorio.unal.edu.co:unal/773562024-07-17 23:13:30.737Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co

Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Publicaciones similares