Hierarchical multi-label classification methods for gene function prediction
This dissertation studies the problem of predicting gene functions from a computational approach. The goal of this problem is to predict associations between genes and functions, where genes can be associated to multiple biological functions and functions have a hierarchical organization. Four machi...
- Autores:
-
Romero González , Miguel Ángel
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2022
- Institución:
- Pontificia Universidad Javeriana Cali
- Repositorio:
- Vitela
- Idioma:
- eng
- OAI Identifier:
- oai:vitela.javerianacali.edu.co:11522/2088
- Acceso en línea:
- https://vitela.javerianacali.edu.co/handle/11522/2088
- Palabra clave:
- Rights
- License
- https://creativecommons.org/licenses/by-nc-nd/4.0/
Summary: | This dissertation studies the problem of predicting gene functions from a computational approach. The goal of this problem is to predict associations between genes and functions, where genes can be associated to multiple biological functions and functions have a hierarchical organization. Four machine learning methods are developed focusing on different aspects of the problem, which has been modeled as a classification task: (a) considering hierarchical relations between functions to produce consistent predictions; (b) creating new data representations to built predictive models; (c) exploiting paths of functions in the hierarchy to detect missing annotations of genes; and (d) integrating information available for multiple organisms into the classification task. The main contributions of this work include novel methods that (i) overcome the limitations of the combinatorial gene function prediction problem; (ii) can be used to effectively identify associations between genes and functions of different organisms, including those that do not have enough data available to train predictive models; and (iii) help to narrow down the search space for in vivo experiments. These methods have been tested in efforts to predict gene functions in rice and maize, but have been formulated more generally and are applicable to any multi-label classification problem where the classes are organized into a hierarchy. |
---|