Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key...

Full description

Autores:
Cortés Muñoz, Fabián
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2019
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/77356
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/77356
http://bdigital.unal.edu.co/75041/
Palabra clave:
Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
id UNACIONAL2_7327a784c1c66a4be304617cd89873a1
oai_identifier_str oai:repositorio.unal.edu.co:unal/77356
network_acronym_str UNACIONAL2
network_name_str Universidad Nacional de Colombia
repository_id_str
dc.title.spa.fl_str_mv Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
spellingShingle Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
title_short Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_full Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_fullStr Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_full_unstemmed Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
title_sort Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
dc.creator.fl_str_mv Cortés Muñoz, Fabián
dc.contributor.author.spa.fl_str_mv Cortés Muñoz, Fabián
dc.contributor.spa.fl_str_mv López Kleine, Liliana
dc.subject.proposal.spa.fl_str_mv Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
topic Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
description Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.
publishDate 2019
dc.date.issued.spa.fl_str_mv 2019
dc.date.accessioned.spa.fl_str_mv 2020-03-30T06:47:49Z
dc.date.available.spa.fl_str_mv 2020-03-30T06:47:49Z
dc.type.spa.fl_str_mv Trabajo de grado - Doctorado
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/doctoralThesis
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_db06
dc.type.content.spa.fl_str_mv Text
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/TD
format http://purl.org/coar/resource_type/c_db06
status_str acceptedVersion
dc.identifier.uri.none.fl_str_mv https://repositorio.unal.edu.co/handle/unal/77356
dc.identifier.eprints.spa.fl_str_mv http://bdigital.unal.edu.co/75041/
url https://repositorio.unal.edu.co/handle/unal/77356
http://bdigital.unal.edu.co/75041/
dc.language.iso.spa.fl_str_mv spa
language spa
dc.relation.ispartof.spa.fl_str_mv Universidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de Estadística
Departamento de Estadística
dc.relation.haspart.spa.fl_str_mv 31 Colecciones de estadística general / Statistics
dc.relation.references.spa.fl_str_mv Cortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.
dc.rights.spa.fl_str_mv Derechos reservados - Universidad Nacional de Colombia
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.license.spa.fl_str_mv Atribución-NoComercial 4.0 Internacional
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv Atribución-NoComercial 4.0 Internacional
Derechos reservados - Universidad Nacional de Colombia
http://creativecommons.org/licenses/by-nc/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
institution Universidad Nacional de Colombia
bitstream.url.fl_str_mv https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf
https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg
bitstream.checksum.fl_str_mv 3e44d6f10df2d3f39b797c3983dc437a
595bc0f299371c7a4ed679cdcae7c3fd
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
repository.name.fl_str_mv Repositorio Institucional Universidad Nacional de Colombia
repository.mail.fl_str_mv repositorio_nal@unal.edu.co
_version_ 1814089317267537920
spelling Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2López Kleine, LilianaCortés Muñoz, Fabián761f3bda-d5e5-4157-8c40-b6156a3194833002020-03-30T06:47:49Z2020-03-30T06:47:49Z2019https://repositorio.unal.edu.co/handle/unal/77356http://bdigital.unal.edu.co/75041/Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.En varios contextos de análisis de datos genómicos, una gran cantidad de hipótesis estadísticas se ponen a prueba al mismo tiempo sobre un mismo conjunto de datos. En Estudios de asociación del genoma completo (GWAS, por sus siglas en inglés), se evalúa la asociación entre fenotipos categóricos (como por ejemplo, sanos versus enfermos) y la presencia o ausencia de polimorfismos de nucleótido simples (SNPs, por sus siglas en inglés) mediante pruebas estadísticas. En éstos tipos de estudios, dos problemas principales se deben abordar: qué método usar para la corrección por pruebas múltiples y cómo aumentar la potencia estadística después de realizar ajustes al error tipo I. La obtención de un criterio sólido para declarar asociaciones significativas con alto poder estadístico sin aumentar el tamaño de la muestra es crucial para estos estudios de asociación. Además los métodos propuestos deben ser fáciles de usar y comprender para los científicos que producen los datos. Diversos métodos han sido desarrollados a fin de abordar tales limitaciones con importantes resultados en la disminución de los errores de tipo I y tipo II. Los métodos propuestos se basan principalmente en cambiar el tipo de prueba para establecer la asociación y extenderla a fenotipos continuos. Algunos de ellos son métodos muy complejos y son difíciles de usar para los no estadísticos, quienes son generalmente los que producen los datos. Además, muy pocos métodos se han centrado en desarrollar una nueva prueba estadística para datos categóricos, que es la forma más común de medir rasgos fenotípicos en humanos y otros organismos. Una nueva prueba estadística, llamada Cociente C, se ha propuesto y permite evaluar asociaciones entre una gran cantidad de SNPs y un rasgo categórico, utilizando un estadística de prueba basado en los valores máximos de variables aleatorias χ 2 . Esta nueva metodología ha demostrado en datos reales, encontrar un número mayor de polimorfismos asociados con el fenotipo y poseer mayor potencia estadística en comparación con las propuestas clásicas de corrección por pruebas múltiples.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de EstadísticaDepartamento de Estadística31 Colecciones de estadística general / StatisticsCortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)Trabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDGenome-wide association studiesMultiple testing problemcategorical data dataType I errorStatistical powerEstudios de asociación del genoma completoProblemas de comparaciones múltiplesDatos categóricosError tipo IPotencia estadísticaORIGINALMethodology for estimating associatio between.pdfapplication/pdf1261551https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf3e44d6f10df2d3f39b797c3983dc437aMD51THUMBNAILMethodology for estimating associatio between.pdf.jpgMethodology for estimating associatio between.pdf.jpgGenerated Thumbnailimage/jpeg2580https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg595bc0f299371c7a4ed679cdcae7c3fdMD52unal/77356oai:repositorio.unal.edu.co:unal/773562024-07-17 23:13:30.737Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co