Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes
In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, w...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2015
- Institución:
- Universidad Tecnológica de Bolívar
- Repositorio:
- Repositorio Institucional UTB
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.utb.edu.co:20.500.12585/9013
- Acceso en línea:
- https://hdl.handle.net/20.500.12585/9013
- Palabra clave:
- 3D protein descriptor
Bilinear form
Coulombic matrix
LDA
Protein structural classes
Amino acid
Macromolecule
Protein
Amino acid
Discriminant analysis
Matrix
Protein
Three-dimensional modeling
Amino acid analysis
Article
Correlation coefficient
Macromolecule
Mathematical parameters
Nonbiological model
Priority journal
Protein analysis
Protein function
Protein structure
Statistical parameters
Structure analysis
Validation study
Algorithm
Biological model
Biology
Chemical structure
Chemistry
Computer simulation
Macromolecule
Markov chain
Procedures
Protein conformation
Quantitative structure activity relation
Reproducibility
Statistical model
Algorithms
Amino Acids
Computational Biology
Computer simulation
Linear Models
Macromolecular Substances
Models, Biological
Models, Molecular
Protein conformation
Proteins
Quantitative Structure-Activity Relationship
Reproducibility of Results
Stochastic processes
- Rights
- restrictedAccess
- License
- http://creativecommons.org/licenses/by-nc-nd/4.0/
Summary: | In the present study, we introduce novel 3D protein descriptors based on the bilinear algebraic form in the ℝn space on the coulombic matrix. For the calculation of these descriptors, macromolecular vectors belonging to ℝn space, whose components represent certain amino acid side-chain properties, were used as weighting schemes. Generalization approaches for the calculation of inter-amino acidic residue spatial distances based on Minkowski metrics are proposed. The simple- and double-stochastic schemes were defined as approaches to normalize the coulombic matrix. The local-fragment indices for both amino acid-types and amino acid-groups are presented in order to permit characterizing fragments of interest in proteins. On the other hand, with the objective of taking into account specific interactions among amino acids in global or local indices, geometric and topological cut-offs are defined. To assess the utility of global and local indices a classification model for the prediction of the major four protein structural classes, was built with the Linear Discriminant Analysis (LDA) technique. The developed LDA-model correctly classifies the 92.6% and 92.7% of the proteins on the training and test sets, respectively. The obtained model showed high values of the generalized square correlation coefficient (GC2) on both the training and test series. The statistical parameters derived from the internal and external validation procedures demonstrate the robustness, stability and the high predictive power of the proposed model. The performance of the LDA-model demonstrates the capability of the proposed indices not only to codify relevant biochemical information related to the structural classes of proteins, but also to yield suitable interpretability. It is anticipated that the current method will benefit the prediction of other protein attributes or functions. © 2015 Elsevier Ltd. |
---|