A new pipeline for structural characterization and classification of RNA-Seq microbiome data
Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables...
- Autores:
-
Racedo, Sebastian
Portnoy, Ivan
I. Vélez, Jorge
San-Juan-Vergara, Homero
Sanjuan, Marco
Zurek, Eduardo
- Tipo de recurso:
- Article of journal
- Fecha de publicación:
- 2021
- Institución:
- Corporación Universidad de la Costa
- Repositorio:
- REDICUC - Repositorio CUC
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.cuc.edu.co:11323/8511
- Acceso en línea:
- https://hdl.handle.net/11323/8511
https://doi.org/10.1186/s13040-021-00266-7
https://repositorio.cuc.edu.co/
- Palabra clave:
- Microbial communities
Compositional nature
Classification method
16 rRNA sequencing
- Rights
- openAccess
- License
- Attribution-NonCommercial-NoDerivatives 4.0 International
id |
RCUC2_7725271ddac6f2b46db1cae0c1530165 |
---|---|
oai_identifier_str |
oai:repositorio.cuc.edu.co:11323/8511 |
network_acronym_str |
RCUC2 |
network_name_str |
REDICUC - Repositorio CUC |
repository_id_str |
|
dc.title.spa.fl_str_mv |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
title |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
spellingShingle |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data Microbial communities Compositional nature Classification method 16 rRNA sequencing |
title_short |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
title_full |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
title_fullStr |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
title_full_unstemmed |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
title_sort |
A new pipeline for structural characterization and classification of RNA-Seq microbiome data |
dc.creator.fl_str_mv |
Racedo, Sebastian Portnoy, Ivan I. Vélez, Jorge San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo |
dc.contributor.author.spa.fl_str_mv |
Racedo, Sebastian Portnoy, Ivan I. Vélez, Jorge San-Juan-Vergara, Homero Sanjuan, Marco Zurek, Eduardo |
dc.subject.spa.fl_str_mv |
Microbial communities Compositional nature Classification method 16 rRNA sequencing |
topic |
Microbial communities Compositional nature Classification method 16 rRNA sequencing |
description |
Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments. |
publishDate |
2021 |
dc.date.accessioned.none.fl_str_mv |
2021-08-10T14:09:34Z |
dc.date.available.none.fl_str_mv |
2021-08-10T14:09:34Z |
dc.date.issued.none.fl_str_mv |
2021-07-09 |
dc.type.spa.fl_str_mv |
Artículo de revista |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.coar.spa.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
dc.type.content.spa.fl_str_mv |
Text |
dc.type.driver.spa.fl_str_mv |
info:eu-repo/semantics/article |
dc.type.redcol.spa.fl_str_mv |
http://purl.org/redcol/resource_type/ART |
dc.type.version.spa.fl_str_mv |
info:eu-repo/semantics/acceptedVersion |
format |
http://purl.org/coar/resource_type/c_6501 |
status_str |
acceptedVersion |
dc.identifier.uri.spa.fl_str_mv |
https://hdl.handle.net/11323/8511 |
dc.identifier.doi.spa.fl_str_mv |
https://doi.org/10.1186/s13040-021-00266-7 |
dc.identifier.instname.spa.fl_str_mv |
Corporación Universidad de la Costa |
dc.identifier.reponame.spa.fl_str_mv |
REDICUC - Repositorio CUC |
dc.identifier.repourl.spa.fl_str_mv |
https://repositorio.cuc.edu.co/ |
url |
https://hdl.handle.net/11323/8511 https://doi.org/10.1186/s13040-021-00266-7 https://repositorio.cuc.edu.co/ |
identifier_str_mv |
Corporación Universidad de la Costa REDICUC - Repositorio CUC |
dc.language.iso.none.fl_str_mv |
eng |
language |
eng |
dc.relation.references.spa.fl_str_mv |
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature [Internet]. 2007;449(7164):804–10. Available from: https://doi.org/10.1038/nature06244. Kitano H. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet [Internet]. 2002 [cited 2019 Nov 13];41(1):1–10. Available from: https://doi.org/10.1007/s00294-002-0285-z. Oltvai ZN. Life’s complexity pyramid Zoltán N. Oltvai. 2010;763(2002). Kitano H. Systems biology: a brief overview. 2015;(April 2002). Voorhies AA, Ott CM, Mehta S, Pierson DL, Crucian BE, Feiveson A, et al. Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Sci Rep [Internet]. 2019;1–17. Available from: https://doi.org/10.1038/s41598-019-46303-8 Somerville C, Somerville S. Plant functional genomics. Science. 1999;285(5426):380–3. Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarray data. BMC Bioinformatics. 2010;11(1):95. Gill R, Datta S, Datta S. dna: an R package for differential network analysis. Bioinformation. 2014;10(4):233. Juric D, Lacayo NJ, Ramsey MC, Racevskis J, Wiernik PH, Rowe JM, et al. Differential gene expression patterns and interaction networks in BCR-ABL—positive and—negative adult acute lymphoblastic leukemias. J Clin Oncol. 2007;25(11):1341–9. Van Treuren W, Ren B, Gevers D, Kugathasan S, Denson LA, Va Y, et al. Resource the treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92. Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics. 2015;16(1):327. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41. Rao KR, Lakshminarayanan S. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst. 2007;86(1):68–81. Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11(5):e1004226. Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982:139–77. Filzmoser P, Hron K, Reimann C. Science of the Total Environment Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ [Internet]. 2009;407(23):6100–8. Available from: https://doi.org/10.1016/j.scitotenv.2009.08.008. Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics [Internet]. 2014;30(16):2351–9. Available from: https://doi.org/10.1093/bioinformatics/btu307. Atchison J, Shen SM. Logistic-normal distributions: some properties and uses. Biometrika. 1980;67(2):261–72. Aitchison J. A new approach to null correlations of proportions. J Int Assoc Math Geol. 1981;13(2):175–89. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio transformations for compositional data analysis. Math Geol [Internet]. 2003;35(3):279–300. Available from: https://doi.org/10.1023/A:1023818214614. Greenacre M, Grunsky E. The isometric logratio transformation in compositional data analysis: a practical evaluation. 2019. Pan M, Zhang J. Correlation-based linear discriminant classification for gene expression data. Genet Mol Res. 2017;16(1). Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015;2015. Goswami S, Chakrabarti A, Chakraborty B. Analysis of correlation structure of data set for efficient pattern classification. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF); 2015. p. 24–9. Russell EL, Chiang LH, Braatz RD. Data-driven methods for fault detection and diagnosis in chemical processes. New York: Springer Science & Business Media; 2012. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17. Serban N, Critchley-Thorne R, Lee P, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007;23(7):850–8. Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687. Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18(1):1–14. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2. Kavitha KR, Rajendran GS, Varsha J. A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2016. p. 2677–83. Collins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 2011;9(1):103. Aarøe J, Lindahl T, Dumeaux V, Sæbø S, Tobin D, Hagen N, et al. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12(1):R7. Datta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol. 2008;7(2). Šonka M, Hlaváč V, Boyle R. Image processing, analysis, and machine vision. International Student Edition; 2008. Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15(1):14. Bevilacqua V, Mastronardi G, Menolascina F, Paradiso A, Tommasi S. Genetic algorithms and artificial neural networks in microarray data analysis: a distributed approach. Eng Lett. 2006;13(4). Ca DAV, Mc V. Gene expression data classification using support vector machine and mutual information-based gene selection. Proc Comput Sci. 2015;47:13–21. van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene--disease predictions. Brief Bioinform. 2018;19(4):575–92. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87. Bhuvaneswari V, et al. Classification of microarray gene expression data by gene combinations using fuzzy logic (MGC-FL). Int J Comput Sci Eng Appl. 2012;2(4):79. Belciug S, Gorunescu F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J Biomed Inform. 2018;83:159–66. Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19(9):1061–9. Friedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39. Fix E, Hodges Jr JL. Discriminatory analysis-nonparametric discrimination: small sample performance; 1952. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: CRC Press; 1984. Martín-Fernández J-A, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58. Pearson K. Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60(359–367):489–98. McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American Gut: an open platform for citizen science microbiome research. Msystems. 2018;3(3):e00031–18. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72. |
dc.rights.spa.fl_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International |
dc.rights.uri.spa.fl_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
dc.rights.accessrights.spa.fl_str_mv |
info:eu-repo/semantics/openAccess |
dc.rights.coar.spa.fl_str_mv |
http://purl.org/coar/access_right/c_abf2 |
rights_invalid_str_mv |
Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ http://purl.org/coar/access_right/c_abf2 |
eu_rights_str_mv |
openAccess |
dc.format.mimetype.spa.fl_str_mv |
application/pdf |
dc.source.spa.fl_str_mv |
BioData Mining |
institution |
Corporación Universidad de la Costa |
dc.source.url.spa.fl_str_mv |
https://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-00266-7 |
bitstream.url.fl_str_mv |
https://repositorio.cuc.edu.co/bitstreams/dc1cbe9f-90ca-4fd2-a3b9-514e1a01f71c/download https://repositorio.cuc.edu.co/bitstreams/d20aed44-fb50-4393-9620-afa958ab2688/download https://repositorio.cuc.edu.co/bitstreams/f9a42f69-36b7-4a1a-9bd1-45177210e51a/download https://repositorio.cuc.edu.co/bitstreams/f0ef351a-159a-4b2f-9f99-f9965f0fb822/download https://repositorio.cuc.edu.co/bitstreams/786fd4b8-6812-4dd4-8225-ff0e6e057d10/download |
bitstream.checksum.fl_str_mv |
5a322ee7bde570e0e40de066450a698e 4460e5956bc1d1639be9ae6146a50347 e30e9215131d99561d40d6b0abbe9bad b52d676cbcffb23945b96b531587d945 a87442d355d8b8f7f8ed103cd05d4f33 |
bitstream.checksumAlgorithm.fl_str_mv |
MD5 MD5 MD5 MD5 MD5 |
repository.name.fl_str_mv |
Repositorio de la Universidad de la Costa CUC |
repository.mail.fl_str_mv |
repdigital@cuc.edu.co |
_version_ |
1828166885045174272 |
spelling |
Racedo, SebastianPortnoy, IvanI. Vélez, JorgeSan-Juan-Vergara, HomeroSanjuan, MarcoZurek, Eduardo2021-08-10T14:09:34Z2021-08-10T14:09:34Z2021-07-09https://hdl.handle.net/11323/8511https://doi.org/10.1186/s13040-021-00266-7Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.Racedo, SebastianPortnoy, IvanI. Vélez, JorgeSan-Juan-Vergara, HomeroSanjuan, MarcoZurek, Eduardoapplication/pdfengAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2BioData Mininghttps://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-00266-7Microbial communitiesCompositional natureClassification method16 rRNA sequencingA new pipeline for structural characterization and classification of RNA-Seq microbiome dataArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersionTurnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature [Internet]. 2007;449(7164):804–10. Available from: https://doi.org/10.1038/nature06244.Kitano H. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet [Internet]. 2002 [cited 2019 Nov 13];41(1):1–10. Available from: https://doi.org/10.1007/s00294-002-0285-z.Oltvai ZN. Life’s complexity pyramid Zoltán N. Oltvai. 2010;763(2002).Kitano H. Systems biology: a brief overview. 2015;(April 2002).Voorhies AA, Ott CM, Mehta S, Pierson DL, Crucian BE, Feiveson A, et al. Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Sci Rep [Internet]. 2019;1–17. Available from: https://doi.org/10.1038/s41598-019-46303-8Somerville C, Somerville S. Plant functional genomics. Science. 1999;285(5426):380–3.Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarray data. BMC Bioinformatics. 2010;11(1):95.Gill R, Datta S, Datta S. dna: an R package for differential network analysis. Bioinformation. 2014;10(4):233.Juric D, Lacayo NJ, Ramsey MC, Racevskis J, Wiernik PH, Rowe JM, et al. Differential gene expression patterns and interaction networks in BCR-ABL—positive and—negative adult acute lymphoblastic leukemias. J Clin Oncol. 2007;25(11):1341–9.Van Treuren W, Ren B, Gevers D, Kugathasan S, Denson LA, Va Y, et al. Resource the treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92.Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics. 2015;16(1):327.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41.Rao KR, Lakshminarayanan S. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst. 2007;86(1):68–81.Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11(5):e1004226.Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982:139–77.Filzmoser P, Hron K, Reimann C. Science of the Total Environment Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ [Internet]. 2009;407(23):6100–8. Available from: https://doi.org/10.1016/j.scitotenv.2009.08.008.Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics [Internet]. 2014;30(16):2351–9. Available from: https://doi.org/10.1093/bioinformatics/btu307.Atchison J, Shen SM. Logistic-normal distributions: some properties and uses. Biometrika. 1980;67(2):261–72.Aitchison J. A new approach to null correlations of proportions. J Int Assoc Math Geol. 1981;13(2):175–89.Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio transformations for compositional data analysis. Math Geol [Internet]. 2003;35(3):279–300. Available from: https://doi.org/10.1023/A:1023818214614.Greenacre M, Grunsky E. The isometric logratio transformation in compositional data analysis: a practical evaluation. 2019.Pan M, Zhang J. Correlation-based linear discriminant classification for gene expression data. Genet Mol Res. 2017;16(1).Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015;2015.Goswami S, Chakrabarti A, Chakraborty B. Analysis of correlation structure of data set for efficient pattern classification. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF); 2015. p. 24–9.Russell EL, Chiang LH, Braatz RD. Data-driven methods for fault detection and diagnosis in chemical processes. New York: Springer Science & Business Media; 2012.Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.Serban N, Critchley-Thorne R, Lee P, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007;23(7):850–8.Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687.Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18(1):1–14.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2.Kavitha KR, Rajendran GS, Varsha J. A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2016. p. 2677–83.Collins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 2011;9(1):103.Aarøe J, Lindahl T, Dumeaux V, Sæbø S, Tobin D, Hagen N, et al. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12(1):R7.Datta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol. 2008;7(2).Šonka M, Hlaváč V, Boyle R. Image processing, analysis, and machine vision. International Student Edition; 2008.Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15(1):14.Bevilacqua V, Mastronardi G, Menolascina F, Paradiso A, Tommasi S. Genetic algorithms and artificial neural networks in microarray data analysis: a distributed approach. Eng Lett. 2006;13(4).Ca DAV, Mc V. Gene expression data classification using support vector machine and mutual information-based gene selection. Proc Comput Sci. 2015;47:13–21.van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene--disease predictions. Brief Bioinform. 2018;19(4):575–92.Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87.Bhuvaneswari V, et al. Classification of microarray gene expression data by gene combinations using fuzzy logic (MGC-FL). Int J Comput Sci Eng Appl. 2012;2(4):79.Belciug S, Gorunescu F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J Biomed Inform. 2018;83:159–66.Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19(9):1061–9.Friedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407.Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39.Fix E, Hodges Jr JL. Discriminatory analysis-nonparametric discrimination: small sample performance; 1952.Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: CRC Press; 1984.Martín-Fernández J-A, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58.Pearson K. Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60(359–367):489–98.McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American Gut: an open platform for citizen science microbiome research. Msystems. 2018;3(3):e00031–18.DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.PublicationORIGINALs13040-021-00266-7.pdfs13040-021-00266-7.pdfapplication/pdf967379https://repositorio.cuc.edu.co/bitstreams/dc1cbe9f-90ca-4fd2-a3b9-514e1a01f71c/download5a322ee7bde570e0e40de066450a698eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.cuc.edu.co/bitstreams/d20aed44-fb50-4393-9620-afa958ab2688/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-83196https://repositorio.cuc.edu.co/bitstreams/f9a42f69-36b7-4a1a-9bd1-45177210e51a/downloade30e9215131d99561d40d6b0abbe9badMD53THUMBNAILs13040-021-00266-7.pdf.jpgs13040-021-00266-7.pdf.jpgimage/jpeg53925https://repositorio.cuc.edu.co/bitstreams/f0ef351a-159a-4b2f-9f99-f9965f0fb822/downloadb52d676cbcffb23945b96b531587d945MD54TEXTs13040-021-00266-7.pdf.txts13040-021-00266-7.pdf.txttext/plain53139https://repositorio.cuc.edu.co/bitstreams/786fd4b8-6812-4dd4-8225-ff0e6e057d10/downloada87442d355d8b8f7f8ed103cd05d4f33MD5511323/8511oai:repositorio.cuc.edu.co:11323/85112024-09-17 14:22:02.163http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accesshttps://repositorio.cuc.edu.coRepositorio de la Universidad de la Costa CUCrepdigital@cuc.edu.coQXV0b3Jpem8gKGF1dG9yaXphbW9zKSBhIGxhIEJpYmxpb3RlY2EgZGUgbGEgSW5zdGl0dWNpw7NuIHBhcmEgcXVlIGluY2x1eWEgdW5hIGNvcGlhLCBpbmRleGUgeSBkaXZ1bGd1ZSBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsLCBsYSBvYnJhIG1lbmNpb25hZGEgY29uIGVsIGZpbiBkZSBmYWNpbGl0YXIgbG9zIHByb2Nlc29zIGRlIHZpc2liaWxpZGFkIGUgaW1wYWN0byBkZSBsYSBtaXNtYSwgY29uZm9ybWUgYSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBxdWUgbWUobm9zKSBjb3JyZXNwb25kZShuKSB5IHF1ZSBpbmNsdXllbjogbGEgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgZGlzdHJpYnVjacOzbiBhbCBww7pibGljbywgdHJhbnNmb3JtYWNpw7NuLCBkZSBjb25mb3JtaWRhZCBjb24gbGEgbm9ybWF0aXZpZGFkIHZpZ2VudGUgc29icmUgZGVyZWNob3MgZGUgYXV0b3IgeSBkZXJlY2hvcyBjb25leG9zIHJlZmVyaWRvcyBlbiBhcnQuIDIsIDEyLCAzMCAobW9kaWZpY2FkbyBwb3IgZWwgYXJ0IDUgZGUgbGEgbGV5IDE1MjAvMjAxMiksIHkgNzIgZGUgbGEgbGV5IDIzIGRlIGRlIDE5ODIsIExleSA0NCBkZSAxOTkzLCBhcnQuIDQgeSAxMSBEZWNpc2nDs24gQW5kaW5hIDM1MSBkZSAxOTkzIGFydC4gMTEsIERlY3JldG8gNDYwIGRlIDE5OTUsIENpcmN1bGFyIE5vIDA2LzIwMDIgZGUgbGEgRGlyZWNjacOzbiBOYWNpb25hbCBkZSBEZXJlY2hvcyBkZSBhdXRvciwgYXJ0LiAxNSBMZXkgMTUyMCBkZSAyMDEyLCBsYSBMZXkgMTkxNSBkZSAyMDE4IHkgZGVtw6FzIG5vcm1hcyBzb2JyZSBsYSBtYXRlcmlhLg0KDQpBbCByZXNwZWN0byBjb21vIEF1dG9yKGVzKSBtYW5pZmVzdGFtb3MgY29ub2NlciBxdWU6DQoNCi0gTGEgYXV0b3JpemFjacOzbiBlcyBkZSBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZhIHkgbGltaXRhZGEsIGVzdG8gaW1wbGljYSBxdWUgbGEgbGljZW5jaWEgdGllbmUgdW5hIHZpZ2VuY2lhLCBxdWUgbm8gZXMgcGVycGV0dWEgeSBxdWUgZWwgYXV0b3IgcHVlZGUgcHVibGljYXIgbyBkaWZ1bmRpciBzdSBvYnJhIGVuIGN1YWxxdWllciBvdHJvIG1lZGlvLCBhc8OtIGNvbW8gbGxldmFyIGEgY2FibyBjdWFscXVpZXIgdGlwbyBkZSBhY2Npw7NuIHNvYnJlIGVsIGRvY3VtZW50by4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIHRlbmRyw6EgdW5hIHZpZ2VuY2lhIGRlIGNpbmNvIGHDsW9zIGEgcGFydGlyIGRlbCBtb21lbnRvIGRlIGxhIGluY2x1c2nDs24gZGUgbGEgb2JyYSBlbiBlbCByZXBvc2l0b3JpbywgcHJvcnJvZ2FibGUgaW5kZWZpbmlkYW1lbnRlIHBvciBlbCB0aWVtcG8gZGUgZHVyYWNpw7NuIGRlIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlbCBhdXRvciB5IHBvZHLDoSBkYXJzZSBwb3IgdGVybWluYWRhIHVuYSB2ZXogZWwgYXV0b3IgbG8gbWFuaWZpZXN0ZSBwb3IgZXNjcml0byBhIGxhIGluc3RpdHVjacOzbiwgY29uIGxhIHNhbHZlZGFkIGRlIHF1ZSBsYSBvYnJhIGVzIGRpZnVuZGlkYSBnbG9iYWxtZW50ZSB5IGNvc2VjaGFkYSBwb3IgZGlmZXJlbnRlcyBidXNjYWRvcmVzIHkvbyByZXBvc2l0b3Jpb3MgZW4gSW50ZXJuZXQgbG8gcXVlIG5vIGdhcmFudGl6YSBxdWUgbGEgb2JyYSBwdWVkYSBzZXIgcmV0aXJhZGEgZGUgbWFuZXJhIGlubWVkaWF0YSBkZSBvdHJvcyBzaXN0ZW1hcyBkZSBpbmZvcm1hY2nDs24gZW4gbG9zIHF1ZSBzZSBoYXlhIGluZGV4YWRvLCBkaWZlcmVudGVzIGFsIHJlcG9zaXRvcmlvIGluc3RpdHVjaW9uYWwgZGUgbGEgSW5zdGl0dWNpw7NuLCBkZSBtYW5lcmEgcXVlIGVsIGF1dG9yKHJlcykgdGVuZHLDoW4gcXVlIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBzdSBvYnJhIGRpcmVjdGFtZW50ZSBhIG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBkaXN0aW50b3MgYWwgZGUgbGEgSW5zdGl0dWNpw7NuIHNpIGRlc2VhIHF1ZSBzdSBvYnJhIHNlYSByZXRpcmFkYSBkZSBpbm1lZGlhdG8uDQoNCi0gTGEgYXV0b3JpemFjacOzbiBkZSBwdWJsaWNhY2nDs24gY29tcHJlbmRlIGVsIGZvcm1hdG8gb3JpZ2luYWwgZGUgbGEgb2JyYSB5IHRvZG9zIGxvcyBkZW3DoXMgcXVlIHNlIHJlcXVpZXJhIHBhcmEgc3UgcHVibGljYWNpw7NuIGVuIGVsIHJlcG9zaXRvcmlvLiBJZ3VhbG1lbnRlLCBsYSBhdXRvcml6YWNpw7NuIHBlcm1pdGUgYSBsYSBpbnN0aXR1Y2nDs24gZWwgY2FtYmlvIGRlIHNvcG9ydGUgZGUgbGEgb2JyYSBjb24gZmluZXMgZGUgcHJlc2VydmFjacOzbiAoaW1wcmVzbywgZWxlY3Ryw7NuaWNvLCBkaWdpdGFsLCBJbnRlcm5ldCwgaW50cmFuZXQsIG8gY3VhbHF1aWVyIG90cm8gZm9ybWF0byBjb25vY2lkbyBvIHBvciBjb25vY2VyKS4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIGVzIGdyYXR1aXRhIHkgc2UgcmVudW5jaWEgYSByZWNpYmlyIGN1YWxxdWllciByZW11bmVyYWNpw7NuIHBvciBsb3MgdXNvcyBkZSBsYSBvYnJhLCBkZSBhY3VlcmRvIGNvbiBsYSBsaWNlbmNpYSBlc3RhYmxlY2lkYSBlbiBlc3RhIGF1dG9yaXphY2nDs24uDQoNCi0gQWwgZmlybWFyIGVzdGEgYXV0b3JpemFjacOzbiwgc2UgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBlcyBvcmlnaW5hbCB5IG5vIGV4aXN0ZSBlbiBlbGxhIG5pbmd1bmEgdmlvbGFjacOzbiBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcy4gRW4gY2FzbyBkZSBxdWUgZWwgdHJhYmFqbyBoYXlhIHNpZG8gZmluYW5jaWFkbyBwb3IgdGVyY2Vyb3MgZWwgbyBsb3MgYXV0b3JlcyBhc3VtZW4gbGEgcmVzcG9uc2FiaWxpZGFkIGRlbCBjdW1wbGltaWVudG8gZGUgbG9zIGFjdWVyZG9zIGVzdGFibGVjaWRvcyBzb2JyZSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBkZSBsYSBvYnJhIGNvbiBkaWNobyB0ZXJjZXJvLg0KDQotIEZyZW50ZSBhIGN1YWxxdWllciByZWNsYW1hY2nDs24gcG9yIHRlcmNlcm9zLCBlbCBvIGxvcyBhdXRvcmVzIHNlcsOhbiByZXNwb25zYWJsZXMsIGVuIG5pbmfDum4gY2FzbyBsYSByZXNwb25zYWJpbGlkYWQgc2Vyw6EgYXN1bWlkYSBwb3IgbGEgaW5zdGl0dWNpw7NuLg0KDQotIENvbiBsYSBhdXRvcml6YWNpw7NuLCBsYSBpbnN0aXR1Y2nDs24gcHVlZGUgZGlmdW5kaXIgbGEgb2JyYSBlbiDDrW5kaWNlcywgYnVzY2Fkb3JlcyB5IG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBxdWUgZmF2b3JlemNhbiBzdSB2aXNpYmlsaWRhZA== |