Contact map prediction based on cellular automata and protein folding trajectories

In Structural Bioinformatics, it is necessary to know the protein¿s tertiary structure because its specific shape is central in its interaction with binding molecules. Being experimental tertiary structure determination a highly expensive process, computational protein structure prediction becomes a...

Full description

Autores:
Diaz Mariño, Nestor Milciades
Tipo de recurso:
Doctoral thesis
Fecha de publicación:
2019
Institución:
Universidad del Valle
Repositorio:
Repositorio Digital Univalle
Idioma:
eng
OAI Identifier:
oai:bibliotecadigital.univalle.edu.co:10893/14928
Acceso en línea:
https://hdl.handle.net/10893/14928
Palabra clave:
Ciencias de la computación
Bioinformática
Simulación
Inteligencia artificial
Rights
openAccess
License
http://purl.org/coar/access_right/c_abf2
Description
Summary:In Structural Bioinformatics, it is necessary to know the protein¿s tertiary structure because its specific shape is central in its interaction with binding molecules. Being experimental tertiary structure determination a highly expensive process, computational protein structure prediction becomes an alternative option aimed toward cost and technical limitations reduction. In the last decade, residue-residue protein contact prediction (PCP) has taken broad consideration. Currently, PCP has become a common subtask of computational structure prediction. Residue-residue interactions can constraint the space of possible protein conformations, improving protein structure determination. Despite the recent improvements in PCP, the high rate of false positive predicted contacts hinders the applicability of existing PCP tools. To reduce the false positive rate in PCP, we developed a novel approach based on celular automata (CAs), which determines residue-residue contacts that are likely to be actual contacts. Our approach exploits the local interactions found in protein contact maps and the iterative refinement provided by CAs. Our CAs were identified using a parallel genetic algorithm which used for training the PSICOV data set (150 proteins). To benchmark our approach, we used the CASP12 data set (Critical Assessment of Techniques for Structure Prediction, year 2016). Our best CA outperformed the ten PCP tools compared in the benchmark. However, a more detailed analysis using non-parametric Friedman¿s statistical test revealed that our tool does not excel the performance of prominent PCP tools such as MetaPSICOV and RaptorX-Contact. Although our CA-based approach for PCP was successful, the precision for long-range contacts (sequence separation > 24 amino acids) was hard to improve. To enrich local interactions, we proposed a multiclass contact map representation that can improve long-range PCP. Our multiclass contact map was obtained using a large-scale comparison of decision trees. The next step to follow is to reformulate our CA-based approach to incorporate multiclass contacts and repeat the overall process to obtain a new PCP tool