Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)
Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key...
- Autores:
-
Cortés Muñoz, Fabián
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2019
- Institución:
- Universidad Nacional de Colombia
- Repositorio:
- Universidad Nacional de Colombia
- Idioma:
- spa
- OAI Identifier:
- oai:repositorio.unal.edu.co:unal/77356
- Acceso en línea:
- https://repositorio.unal.edu.co/handle/unal/77356
http://bdigital.unal.edu.co/75041/
- Palabra clave:
- Genome-wide association studies
Multiple testing problem
categorical data data
Type I error
Statistical power
Estudios de asociación del genoma completo
Problemas de comparaciones múltiples
Datos categóricos
Error tipo I
Potencia estadística
- Rights
- openAccess
- License
- Atribución-NoComercial 4.0 Internacional
id |
UNACIONAL2_7327a784c1c66a4be304617cd89873a1 |
---|---|
oai_identifier_str |
oai:repositorio.unal.edu.co:unal/77356 |
network_acronym_str |
UNACIONAL2 |
network_name_str |
Universidad Nacional de Colombia |
repository_id_str |
|
dc.title.spa.fl_str_mv |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
title |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
spellingShingle |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística |
title_short |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
title_full |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
title_fullStr |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
title_full_unstemmed |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
title_sort |
Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS) |
dc.creator.fl_str_mv |
Cortés Muñoz, Fabián |
dc.contributor.author.spa.fl_str_mv |
Cortés Muñoz, Fabián |
dc.contributor.spa.fl_str_mv |
López Kleine, Liliana |
dc.subject.proposal.spa.fl_str_mv |
Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística |
topic |
Genome-wide association studies Multiple testing problem categorical data data Type I error Statistical power Estudios de asociación del genoma completo Problemas de comparaciones múltiples Datos categóricos Error tipo I Potencia estadística |
description |
Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg. |
publishDate |
2019 |
dc.date.issued.spa.fl_str_mv |
2019 |
dc.date.accessioned.spa.fl_str_mv |
2020-03-30T06:47:49Z |
dc.date.available.spa.fl_str_mv |
2020-03-30T06:47:49Z |
dc.type.spa.fl_str_mv |
Trabajo de grado - Doctorado |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/doctoralThesis |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_db06 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/TD |
format |
http://purl.org/coar/resource_type/c_db06 |
status_str |
acceptedVersion |
dc.identifier.uri.none.fl_str_mv |
https://repositorio.unal.edu.co/handle/unal/77356 |
dc.identifier.eprints.spa.fl_str_mv |
http://bdigital.unal.edu.co/75041/ |
url |
https://repositorio.unal.edu.co/handle/unal/77356 http://bdigital.unal.edu.co/75041/ |
dc.language.iso.spa.fl_str_mv |
spa |
language |
spa |
dc.relation.ispartof.spa.fl_str_mv |
Universidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de Estadística Departamento de Estadística |
dc.relation.haspart.spa.fl_str_mv |
31 Colecciones de estadística general / Statistics |
dc.relation.references.spa.fl_str_mv |
Cortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá. |
dc.rights.spa.fl_str_mv |
Derechos reservados - Universidad Nacional de Colombia |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
dc.rights.license.spa.fl_str_mv |
Atribución-NoComercial 4.0 Internacional |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
rights_invalid_str_mv |
Atribución-NoComercial 4.0 Internacional Derechos reservados - Universidad Nacional de Colombia http://creativecommons.org/licenses/by-nc/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
institution |
Universidad Nacional de Colombia |
bitstream.url.fl_str_mv |
https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg |
bitstream.checksum.fl_str_mv |
3e44d6f10df2d3f39b797c3983dc437a 595bc0f299371c7a4ed679cdcae7c3fd |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 |
repository.name.fl_str_mv |
Repositorio Institucional Universidad Nacional de Colombia |
repository.mail.fl_str_mv |
repositorio_nal@unal.edu.co |
_version_ |
1814089317267537920 |
spelling |
Atribución-NoComercial 4.0 InternacionalDerechos reservados - Universidad Nacional de Colombiahttp://creativecommons.org/licenses/by-nc/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2López Kleine, LilianaCortés Muñoz, Fabián761f3bda-d5e5-4157-8c40-b6156a3194833002020-03-30T06:47:49Z2020-03-30T06:47:49Z2019https://repositorio.unal.edu.co/handle/unal/77356http://bdigital.unal.edu.co/75041/Several genomic data analysis contexts have a large number of statistical hypotheses, which are tested simultaneously. When the association between categorical phenotypes (i.e. healthy and not healthy) and Single Nucleotide Polymorphisms (SNPs) are ssessed by applying statistical tests, the two key challenges to address are the following: which method is the best for using multiple testing and how to increase the statistical power after adjustment for multiple testing. In this association studies, a solid criterion obtained to consider its significant with high statistical power and without necessarily increasing the sample size is crucial. Numerous methods have been developed for addressing these limitations; they have improved type I and type II errors rates. The proposed methods are mainly based on changing the type for establishing the association and extending it to continuous traits. Some of these statistical methods are very complex, which are difcult to use, specially for non-statisticians who usually obtain such data. Moreover, very few methods focused on developing a new statistical test for categorical data, which is the most common form of measuring phenotypical traits in humans and other organisms. By applying the maximum values of chi-square distribution as the test statistic, this study propose a new statistical test called Quotient C that allows testing associations between thousands of SNPs and a categorical trait. In real datasets, Quotient C is observed to be less stringent criterion that allows the declaration of a large number of associations between SNPs and dichotomous outcomes in comparison with the classical methods used for correcting multiple testing, thus keeping the probability of incorrectly rejecting a true null hypothesis (type I error) equal or less than type I error. The proposed method has a lower type II error rate and a better statistical power than the following methods: Bonferroni, Holm, Hochberg and Benjamini and Hochberg.En varios contextos de análisis de datos genómicos, una gran cantidad de hipótesis estadísticas se ponen a prueba al mismo tiempo sobre un mismo conjunto de datos. En Estudios de asociación del genoma completo (GWAS, por sus siglas en inglés), se evalúa la asociación entre fenotipos categóricos (como por ejemplo, sanos versus enfermos) y la presencia o ausencia de polimorfismos de nucleótido simples (SNPs, por sus siglas en inglés) mediante pruebas estadísticas. En éstos tipos de estudios, dos problemas principales se deben abordar: qué método usar para la corrección por pruebas múltiples y cómo aumentar la potencia estadística después de realizar ajustes al error tipo I. La obtención de un criterio sólido para declarar asociaciones significativas con alto poder estadístico sin aumentar el tamaño de la muestra es crucial para estos estudios de asociación. Además los métodos propuestos deben ser fáciles de usar y comprender para los científicos que producen los datos. Diversos métodos han sido desarrollados a fin de abordar tales limitaciones con importantes resultados en la disminución de los errores de tipo I y tipo II. Los métodos propuestos se basan principalmente en cambiar el tipo de prueba para establecer la asociación y extenderla a fenotipos continuos. Algunos de ellos son métodos muy complejos y son difíciles de usar para los no estadísticos, quienes son generalmente los que producen los datos. Además, muy pocos métodos se han centrado en desarrollar una nueva prueba estadística para datos categóricos, que es la forma más común de medir rasgos fenotípicos en humanos y otros organismos. Una nueva prueba estadística, llamada Cociente C, se ha propuesto y permite evaluar asociaciones entre una gran cantidad de SNPs y un rasgo categórico, utilizando un estadística de prueba basado en los valores máximos de variables aleatorias χ 2 . Esta nueva metodología ha demostrado en datos reales, encontrar un número mayor de polimorfismos asociados con el fenotipo y poseer mayor potencia estadística en comparación con las propuestas clásicas de corrección por pruebas múltiples.Doctoradoapplication/pdfspaUniversidad Nacional de Colombia Sede Bogotá Facultad de Ciencias Departamento de EstadísticaDepartamento de Estadística31 Colecciones de estadística general / StatisticsCortés Muñoz, Fabián (2019) Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS). Doctorado thesis, Universidad Nacional de Colombia - Sede Bogotá.Methodology for estimating association between categorical variables with application to Genome-wide association studies (GWAS)Trabajo de grado - Doctoradoinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/acceptedVersionhttp://purl.org/coar/resource_type/c_db06Texthttp://purl.org/redcol/resource_type/TDGenome-wide association studiesMultiple testing problemcategorical data dataType I errorStatistical powerEstudios de asociación del genoma completoProblemas de comparaciones múltiplesDatos categóricosError tipo IPotencia estadísticaORIGINALMethodology for estimating associatio between.pdfapplication/pdf1261551https://repositorio.unal.edu.co/bitstream/unal/77356/1/Methodology%20for%20estimating%20associatio%20between.pdf3e44d6f10df2d3f39b797c3983dc437aMD51THUMBNAILMethodology for estimating associatio between.pdf.jpgMethodology for estimating associatio between.pdf.jpgGenerated Thumbnailimage/jpeg2580https://repositorio.unal.edu.co/bitstream/unal/77356/2/Methodology%20for%20estimating%20associatio%20between.pdf.jpg595bc0f299371c7a4ed679cdcae7c3fdMD52unal/77356oai:repositorio.unal.edu.co:unal/773562024-07-17 23:13:30.737Repositorio Institucional Universidad Nacional de Colombiarepositorio_nal@unal.edu.co |