Contact map prediction based on cellular automata and protein folding trajectories
In Structural Bioinformatics, it is necessary to know the protein¿s tertiary structure because its specific shape is central in its interaction with binding molecules. Being experimental tertiary structure determination a highly expensive process, computational protein structure prediction becomes a...
- Autores:
-
Diaz Mariño, Nestor Milciades
- Tipo de recurso:
- Doctoral thesis
- Fecha de publicación:
- 2019
- Institución:
- Universidad del Valle
- Repositorio:
- Repositorio Digital Univalle
- Idioma:
- eng
- OAI Identifier:
- oai:bibliotecadigital.univalle.edu.co:10893/14928
- Acceso en línea:
- https://hdl.handle.net/10893/14928
- Palabra clave:
- Ciencias de la computación
Bioinformática
Simulación
Inteligencia artificial
- Rights
- openAccess
- License
- http://purl.org/coar/access_right/c_abf2
Summary: | In Structural Bioinformatics, it is necessary to know the protein¿s tertiary structure because its specific shape is central in its interaction with binding molecules. Being experimental tertiary structure determination a highly expensive process, computational protein structure prediction becomes an alternative option aimed toward cost and technical limitations reduction. In the last decade, residue-residue protein contact prediction (PCP) has taken broad consideration. Currently, PCP has become a common subtask of computational structure prediction. Residue-residue interactions can constraint the space of possible protein conformations, improving protein structure determination. Despite the recent improvements in PCP, the high rate of false positive predicted contacts hinders the applicability of existing PCP tools. To reduce the false positive rate in PCP, we developed a novel approach based on celular automata (CAs), which determines residue-residue contacts that are likely to be actual contacts. Our approach exploits the local interactions found in protein contact maps and the iterative refinement provided by CAs. Our CAs were identified using a parallel genetic algorithm which used for training the PSICOV data set (150 proteins). To benchmark our approach, we used the CASP12 data set (Critical Assessment of Techniques for Structure Prediction, year 2016). Our best CA outperformed the ten PCP tools compared in the benchmark. However, a more detailed analysis using non-parametric Friedman¿s statistical test revealed that our tool does not excel the performance of prominent PCP tools such as MetaPSICOV and RaptorX-Contact. Although our CA-based approach for PCP was successful, the precision for long-range contacts (sequence separation > 24 amino acids) was hard to improve. To enrich local interactions, we proposed a multiclass contact map representation that can improve long-range PCP. Our multiclass contact map was obtained using a large-scale comparison of decision trees. The next step to follow is to reformulate our CA-based approach to incorporate multiclass contacts and repeat the overall process to obtain a new PCP tool |
---|