Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key...

Full description

Autores:
Cortés Muñoz, Fabián
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2019
Institución:
Universidad Nacional de Colombia
Repositorio:
Universidad Nacional de Colombia
Idioma:
spa
OAI Identifier:
oai:repositorio.unal.edu.co:unal/77356
Acceso en línea:
https://repositorio.unal.edu.co/handle/unal/77356
http://bdigital.unal.edu.co/75041/
Palabra clave:
Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
Rights
openAccess
License
Atribución-NoComercial 4.0 Internacional
Description
Summary:Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.