Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach

Developing computational methods for assigning protein function from tertiary structure is a very important problem, predicting a catalytic mechanism based only on structural information being a particularly challenging task. This work focuses on helping to understand the molecular basis of catalysi...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2007
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/28861
Acceso en línea:
https://doi.org/10.1109/BIBE.2007.4375671
https://repository.urosario.edu.co/handle/10336/28861
Palabra clave:
Biochemistry
Machine learning
Sequences
Genomics
Bioinformatics
Predictive models
Protein engineering
Nuclear magnetic resonance
Data mining
Crystallography
Rights
License
Restringido (Acceso a grupos específicos)
id EDOCUR2_e3d4aeccdbabd52f9ea66049c99094b9
oai_identifier_str oai:repository.urosario.edu.co:10336/28861
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
spelling bf233df3-bd1a-4a43-ac6f-6646ac82b6b9-1ee461b88-8e5e-435a-9765-7315b8ce290a-1059f6e0f-cd27-4217-8638-233e8dafe847-179653065-12020-08-28T15:49:56Z2020-08-28T15:49:56Z2007-11-05Developing computational methods for assigning protein function from tertiary structure is a very important problem, predicting a catalytic mechanism based only on structural information being a particularly challenging task. This work focuses on helping to understand the molecular basis of catalysis by exploring the nature of catalytic residues, their environment and characteristic properties in a large data set of enzyme structures and using this information to predict enzyme structures' active sites. A machine learning approach that performs feature extraction, clustering and classification on a protein structure data set is proposed. The 6,376 residues directly involved in enzyme catalysis, present in more than 800 proteins structures in the PDB were analyzed. Feature extraction provided a description of critical features for each catalytic residue, which were consistent with prior knowledge about them. Results from k-fold-cross-validation for classification showed more than 80% accuracy. Complete enzymes were scanned using these classifiers to locate catalytic residues.application/pdfhttps://doi.org/10.1109/BIBE.2007.4375671ISBN: 1-4244-1509-8EISBN: 978-1-4244-1509-0https://repository.urosario.edu.co/handle/10336/28861engIEEE9459382007 IEEE 7th International Symposium on BioInformatics and BioEngineeringIEEE 7th International Symposium on BioInformatics and BioEngineering, ISBN: 1-4244-1509-8;EISBN: 978-1-4244-1509-0 (2007); pp. 938-945https://ieeexplore.ieee.org/document/4375671Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ec2007 IEEE 7th International Symposium on BioInformatics and BioEngineeringinstname:Universidad del Rosarioreponame:Repositorio Institucional EdocURBiochemistryMachine learningSequencesGenomicsBioinformaticsPredictive modelsProtein engineeringNuclear magnetic resonanceData miningCrystallographyCharacterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approachCaracterización y predicción de residuos catalíticos en sitios activos de enzimas según propiedades locales: un enfoque de aprendizaje automáticobookPartParte de librohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_3248Bobadilla, LeonardoNino, FernandoCepeda, EdilbertoPatarroyo, Manuel A.10336/28861oai:repository.urosario.edu.co:10336/288612021-06-03 00:49:42.299https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
dc.title.TranslatedTitle.spa.fl_str_mv Caracterización y predicción de residuos catalíticos en sitios activos de enzimas según propiedades locales: un enfoque de aprendizaje automático
title Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
spellingShingle Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
Biochemistry
Machine learning
Sequences
Genomics
Bioinformatics
Predictive models
Protein engineering
Nuclear magnetic resonance
Data mining
Crystallography
title_short Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
title_full Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
title_fullStr Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
title_full_unstemmed Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
title_sort Characterizing and predicting catalytic residues in enzyme active sites based on local properties: a machine learning approach
dc.subject.keyword.spa.fl_str_mv Biochemistry
Machine learning
Sequences
Genomics
Bioinformatics
Predictive models
Protein engineering
Nuclear magnetic resonance
Data mining
Crystallography
topic Biochemistry
Machine learning
Sequences
Genomics
Bioinformatics
Predictive models
Protein engineering
Nuclear magnetic resonance
Data mining
Crystallography
description Developing computational methods for assigning protein function from tertiary structure is a very important problem, predicting a catalytic mechanism based only on structural information being a particularly challenging task. This work focuses on helping to understand the molecular basis of catalysis by exploring the nature of catalytic residues, their environment and characteristic properties in a large data set of enzyme structures and using this information to predict enzyme structures' active sites. A machine learning approach that performs feature extraction, clustering and classification on a protein structure data set is proposed. The 6,376 residues directly involved in enzyme catalysis, present in more than 800 proteins structures in the PDB were analyzed. Feature extraction provided a description of critical features for each catalytic residue, which were consistent with prior knowledge about them. Results from k-fold-cross-validation for classification showed more than 80% accuracy. Complete enzymes were scanned using these classifiers to locate catalytic residues.
publishDate 2007
dc.date.created.spa.fl_str_mv 2007-11-05
dc.date.accessioned.none.fl_str_mv 2020-08-28T15:49:56Z
dc.date.available.none.fl_str_mv 2020-08-28T15:49:56Z
dc.type.eng.fl_str_mv bookPart
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_3248
dc.type.spa.spa.fl_str_mv Parte de libro
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1109/BIBE.2007.4375671
dc.identifier.issn.none.fl_str_mv ISBN: 1-4244-1509-8
EISBN: 978-1-4244-1509-0
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/28861
url https://doi.org/10.1109/BIBE.2007.4375671
https://repository.urosario.edu.co/handle/10336/28861
identifier_str_mv ISBN: 1-4244-1509-8
EISBN: 978-1-4244-1509-0
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.citationEndPage.none.fl_str_mv 945
dc.relation.citationStartPage.none.fl_str_mv 938
dc.relation.citationTitle.none.fl_str_mv 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering
dc.relation.ispartof.spa.fl_str_mv IEEE 7th International Symposium on BioInformatics and BioEngineering, ISBN: 1-4244-1509-8;EISBN: 978-1-4244-1509-0 (2007); pp. 938-945
dc.relation.uri.spa.fl_str_mv https://ieeexplore.ieee.org/document/4375671
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv Restringido (Acceso a grupos específicos)
rights_invalid_str_mv Restringido (Acceso a grupos específicos)
http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv IEEE
dc.source.spa.fl_str_mv 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering
institution Universidad del Rosario
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1808390563536306176