Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key...

Full description

Autores:: Cortés Muñoz, Fabián

Tipo de recurso:: Doctoral thesis

Fecha de publicación:: 2019

Institución:: Universidad Nacional de Colombia

Repositorio:: Universidad Nacional de Colombia

Idioma:: spa

Description
Summary:	Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.

Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)

Publicaciones similares