A new pipeline for structural characterization and classification of RNA-Seq microbiome data

Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables...

Full description

Autores:
Racedo, Sebastian
Portnoy, Ivan
I. Vélez, Jorge
San-Juan-Vergara, Homero
Sanjuan, Marco
Zurek, Eduardo
Tipo de recurso:
Article of journal
Fecha de publicación:
2021
Institución:
Corporación Universidad de la Costa
Repositorio:
REDICUC - Repositorio CUC
Idioma:
eng
OAI Identifier:
oai:repositorio.cuc.edu.co:11323/8511
Acceso en línea:
https://hdl.handle.net/11323/8511
https://doi.org/10.1186/s13040-021-00266-7
https://repositorio.cuc.edu.co/
Palabra clave:
Microbial communities
Compositional nature
Classification method
16 rRNA sequencing
Rights
openAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 International
id RCUC2_7725271ddac6f2b46db1cae0c1530165
oai_identifier_str oai:repositorio.cuc.edu.co:11323/8511
network_acronym_str RCUC2
network_name_str REDICUC - Repositorio CUC
repository_id_str
dc.title.spa.fl_str_mv A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title A new pipeline for structural characterization and classification of RNA-Seq microbiome data
spellingShingle A new pipeline for structural characterization and classification of RNA-Seq microbiome data
Microbial communities
Compositional nature
Classification method
16 rRNA sequencing
title_short A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_fullStr A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full_unstemmed A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_sort A new pipeline for structural characterization and classification of RNA-Seq microbiome data
dc.creator.fl_str_mv Racedo, Sebastian
Portnoy, Ivan
I. Vélez, Jorge
San-Juan-Vergara, Homero
Sanjuan, Marco
Zurek, Eduardo
dc.contributor.author.spa.fl_str_mv Racedo, Sebastian
Portnoy, Ivan
I. Vélez, Jorge
San-Juan-Vergara, Homero
Sanjuan, Marco
Zurek, Eduardo
dc.subject.spa.fl_str_mv Microbial communities
Compositional nature
Classification method
16 rRNA sequencing
topic Microbial communities
Compositional nature
Classification method
16 rRNA sequencing
description Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.
publishDate 2021
dc.date.accessioned.none.fl_str_mv 2021-08-10T14:09:34Z
dc.date.available.none.fl_str_mv 2021-08-10T14:09:34Z
dc.date.issued.none.fl_str_mv 2021-07-09
dc.type.spa.fl_str_mv Artículo de revista
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.coar.spa.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.content.spa.fl_str_mv Text
dc.type.driver.spa.fl_str_mv info:eu-repo/semantics/article
dc.type.redcol.spa.fl_str_mv http://purl.org/redcol/resource_type/ART
dc.type.version.spa.fl_str_mv info:eu-repo/semantics/acceptedVersion
format http://purl.org/coar/resource_type/c_6501
status_str acceptedVersion
dc.identifier.uri.spa.fl_str_mv https://hdl.handle.net/11323/8511
dc.identifier.doi.spa.fl_str_mv https://doi.org/10.1186/s13040-021-00266-7
dc.identifier.instname.spa.fl_str_mv Corporación Universidad de la Costa
dc.identifier.reponame.spa.fl_str_mv REDICUC - Repositorio CUC
dc.identifier.repourl.spa.fl_str_mv https://repositorio.cuc.edu.co/
url https://hdl.handle.net/11323/8511
https://doi.org/10.1186/s13040-021-00266-7
https://repositorio.cuc.edu.co/
identifier_str_mv Corporación Universidad de la Costa
REDICUC - Repositorio CUC
dc.language.iso.none.fl_str_mv eng
language eng
dc.relation.references.spa.fl_str_mv Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature [Internet]. 2007;449(7164):804–10. Available from: https://doi.org/10.1038/nature06244.
Kitano H. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet [Internet]. 2002 [cited 2019 Nov 13];41(1):1–10. Available from: https://doi.org/10.1007/s00294-002-0285-z.
Oltvai ZN. Life’s complexity pyramid Zoltán N. Oltvai. 2010;763(2002).
Kitano H. Systems biology: a brief overview. 2015;(April 2002).
Voorhies AA, Ott CM, Mehta S, Pierson DL, Crucian BE, Feiveson A, et al. Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Sci Rep [Internet]. 2019;1–17. Available from: https://doi.org/10.1038/s41598-019-46303-8
Somerville C, Somerville S. Plant functional genomics. Science. 1999;285(5426):380–3.
Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarray data. BMC Bioinformatics. 2010;11(1):95.
Gill R, Datta S, Datta S. dna: an R package for differential network analysis. Bioinformation. 2014;10(4):233.
Juric D, Lacayo NJ, Ramsey MC, Racevskis J, Wiernik PH, Rowe JM, et al. Differential gene expression patterns and interaction networks in BCR-ABL—positive and—negative adult acute lymphoblastic leukemias. J Clin Oncol. 2007;25(11):1341–9.
Van Treuren W, Ren B, Gevers D, Kugathasan S, Denson LA, Va Y, et al. Resource the treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92.
Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics. 2015;16(1):327.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41.
Rao KR, Lakshminarayanan S. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst. 2007;86(1):68–81.
Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11(5):e1004226.
Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982:139–77.
Filzmoser P, Hron K, Reimann C. Science of the Total Environment Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ [Internet]. 2009;407(23):6100–8. Available from: https://doi.org/10.1016/j.scitotenv.2009.08.008.
Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics [Internet]. 2014;30(16):2351–9. Available from: https://doi.org/10.1093/bioinformatics/btu307.
Atchison J, Shen SM. Logistic-normal distributions: some properties and uses. Biometrika. 1980;67(2):261–72.
Aitchison J. A new approach to null correlations of proportions. J Int Assoc Math Geol. 1981;13(2):175–89.
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio transformations for compositional data analysis. Math Geol [Internet]. 2003;35(3):279–300. Available from: https://doi.org/10.1023/A:1023818214614.
Greenacre M, Grunsky E. The isometric logratio transformation in compositional data analysis: a practical evaluation. 2019.
Pan M, Zhang J. Correlation-based linear discriminant classification for gene expression data. Genet Mol Res. 2017;16(1).
Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015;2015.
Goswami S, Chakrabarti A, Chakraborty B. Analysis of correlation structure of data set for efficient pattern classification. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF); 2015. p. 24–9.
Russell EL, Chiang LH, Braatz RD. Data-driven methods for fault detection and diagnosis in chemical processes. New York: Springer Science & Business Media; 2012.
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.
Serban N, Critchley-Thorne R, Lee P, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007;23(7):850–8.
Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687.
Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18(1):1–14.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.
Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2.
Kavitha KR, Rajendran GS, Varsha J. A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2016. p. 2677–83.
Collins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 2011;9(1):103.
Aarøe J, Lindahl T, Dumeaux V, Sæbø S, Tobin D, Hagen N, et al. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12(1):R7.
Datta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol. 2008;7(2).
Šonka M, Hlaváč V, Boyle R. Image processing, analysis, and machine vision. International Student Edition; 2008.
Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15(1):14.
Bevilacqua V, Mastronardi G, Menolascina F, Paradiso A, Tommasi S. Genetic algorithms and artificial neural networks in microarray data analysis: a distributed approach. Eng Lett. 2006;13(4).
Ca DAV, Mc V. Gene expression data classification using support vector machine and mutual information-based gene selection. Proc Comput Sci. 2015;47:13–21.
van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene--disease predictions. Brief Bioinform. 2018;19(4):575–92.
Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87.
Bhuvaneswari V, et al. Classification of microarray gene expression data by gene combinations using fuzzy logic (MGC-FL). Int J Comput Sci Eng Appl. 2012;2(4):79.
Belciug S, Gorunescu F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J Biomed Inform. 2018;83:159–66.
Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19(9):1061–9.
Friedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407.
Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39.
Fix E, Hodges Jr JL. Discriminatory analysis-nonparametric discrimination: small sample performance; 1952.
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: CRC Press; 1984.
Martín-Fernández J-A, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58.
Pearson K. Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60(359–367):489–98.
McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American Gut: an open platform for citizen science microbiome research. Msystems. 2018;3(3):e00031–18.
DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.
dc.rights.spa.fl_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri.spa.fl_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.accessrights.spa.fl_str_mv info:eu-repo/semantics/openAccess
dc.rights.coar.spa.fl_str_mv http://purl.org/coar/access_right/c_abf2
rights_invalid_str_mv Attribution-NonCommercial-NoDerivatives 4.0 International
http://creativecommons.org/licenses/by-nc-nd/4.0/
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.mimetype.spa.fl_str_mv application/pdf
dc.source.spa.fl_str_mv BioData Mining
institution Corporación Universidad de la Costa
dc.source.url.spa.fl_str_mv https://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-00266-7
bitstream.url.fl_str_mv https://repositorio.cuc.edu.co/bitstreams/dc1cbe9f-90ca-4fd2-a3b9-514e1a01f71c/download
https://repositorio.cuc.edu.co/bitstreams/d20aed44-fb50-4393-9620-afa958ab2688/download
https://repositorio.cuc.edu.co/bitstreams/f9a42f69-36b7-4a1a-9bd1-45177210e51a/download
https://repositorio.cuc.edu.co/bitstreams/f0ef351a-159a-4b2f-9f99-f9965f0fb822/download
https://repositorio.cuc.edu.co/bitstreams/786fd4b8-6812-4dd4-8225-ff0e6e057d10/download
bitstream.checksum.fl_str_mv 5a322ee7bde570e0e40de066450a698e
4460e5956bc1d1639be9ae6146a50347
e30e9215131d99561d40d6b0abbe9bad
b52d676cbcffb23945b96b531587d945
a87442d355d8b8f7f8ed103cd05d4f33
bitstream.checksumAlgorithm.fl_str_mv MD5
MD5
MD5
MD5
MD5
repository.name.fl_str_mv Repositorio de la Universidad de la Costa CUC
repository.mail.fl_str_mv repdigital@cuc.edu.co
_version_ 1828166885045174272
spelling Racedo, SebastianPortnoy, IvanI. Vélez, JorgeSan-Juan-Vergara, HomeroSanjuan, MarcoZurek, Eduardo2021-08-10T14:09:34Z2021-08-10T14:09:34Z2021-07-09https://hdl.handle.net/11323/8511https://doi.org/10.1186/s13040-021-00266-7Corporación Universidad de la CostaREDICUC - Repositorio CUChttps://repositorio.cuc.edu.co/Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.Racedo, SebastianPortnoy, IvanI. Vélez, JorgeSan-Juan-Vergara, HomeroSanjuan, MarcoZurek, Eduardoapplication/pdfengAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/info:eu-repo/semantics/openAccesshttp://purl.org/coar/access_right/c_abf2BioData Mininghttps://biodatamining.biomedcentral.com/articles/10.1186/s13040-021-00266-7Microbial communitiesCompositional natureClassification method16 rRNA sequencingA new pipeline for structural characterization and classification of RNA-Seq microbiome dataArtículo de revistahttp://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Textinfo:eu-repo/semantics/articlehttp://purl.org/redcol/resource_type/ARTinfo:eu-repo/semantics/acceptedVersionTurnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The Human Microbiome Project. Nature [Internet]. 2007;449(7164):804–10. Available from: https://doi.org/10.1038/nature06244.Kitano H. Looking beyond the details: a rise in system-oriented approaches in genetics and molecular biology. Curr Genet [Internet]. 2002 [cited 2019 Nov 13];41(1):1–10. Available from: https://doi.org/10.1007/s00294-002-0285-z.Oltvai ZN. Life’s complexity pyramid Zoltán N. Oltvai. 2010;763(2002).Kitano H. Systems biology: a brief overview. 2015;(April 2002).Voorhies AA, Ott CM, Mehta S, Pierson DL, Crucian BE, Feiveson A, et al. Study of the impact of long-duration space missions at the International Space Station on the astronaut microbiome. Sci Rep [Internet]. 2019;1–17. Available from: https://doi.org/10.1038/s41598-019-46303-8Somerville C, Somerville S. Plant functional genomics. Science. 1999;285(5426):380–3.Gill R, Datta S, Datta S. A statistical framework for differential network analysis from microarray data. BMC Bioinformatics. 2010;11(1):95.Gill R, Datta S, Datta S. dna: an R package for differential network analysis. Bioinformation. 2014;10(4):233.Juric D, Lacayo NJ, Ramsey MC, Racevskis J, Wiernik PH, Rowe JM, et al. Differential gene expression patterns and interaction networks in BCR-ABL—positive and—negative adult acute lymphoblastic leukemias. J Clin Oncol. 2007;25(11):1341–9.Van Treuren W, Ren B, Gevers D, Kugathasan S, Denson LA, Va Y, et al. Resource the treatment-naive microbiome in new-onset Crohn’s disease. Cell Host Microbe. 2014;15:382–92.Ruan D, Young A, Montana G. Differential analysis of biological networks. BMC Bioinformatics. 2015;16(1):327.Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009;75(23):7537–41.Rao KR, Lakshminarayanan S. Partial correlation based variable selection approach for multivariate data classification methods. Chemom Intell Lab Syst. 2007;86(1):68–81.Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput Biol. 2015;11(5):e1004226.Aitchison J. The statistical analysis of compositional data. J R Stat Soc Ser B. 1982:139–77.Filzmoser P, Hron K, Reimann C. Science of the Total Environment Univariate statistical analysis of environmental (compositional) data: problems and possibilities. Sci Total Environ [Internet]. 2009;407(23):6100–8. Available from: https://doi.org/10.1016/j.scitotenv.2009.08.008.Clark C, Kalita J. A comparison of algorithms for the pairwise alignment of biological networks. Bioinformatics [Internet]. 2014;30(16):2351–9. Available from: https://doi.org/10.1093/bioinformatics/btu307.Atchison J, Shen SM. Logistic-normal distributions: some properties and uses. Biometrika. 1980;67(2):261–72.Aitchison J. A new approach to null correlations of proportions. J Int Assoc Math Geol. 1981;13(2):175–89.Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C. Isometric Logratio transformations for compositional data analysis. Math Geol [Internet]. 2003;35(3):279–300. Available from: https://doi.org/10.1023/A:1023818214614.Greenacre M, Grunsky E. The isometric logratio transformation in compositional data analysis: a practical evaluation. 2019.Pan M, Zhang J. Correlation-based linear discriminant classification for gene expression data. Genet Mol Res. 2017;16(1).Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015;2015.Goswami S, Chakrabarti A, Chakraborty B. Analysis of correlation structure of data set for efficient pattern classification. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF); 2015. p. 24–9.Russell EL, Chiang LH, Braatz RD. Data-driven methods for fault detection and diagnosis in chemical processes. New York: Springer Science & Business Media; 2012.Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507–17.Serban N, Critchley-Thorne R, Lee P, Holmes S. Gene expression network analysis and applications to immunology. Bioinformatics. 2007;23(7):850–8.Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS Comput Biol. 2012;8(9):e1002687.Radovic M, Ghalwash M, Filipovic N, Obradovic Z. Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics. 2017;18(1):1–14.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10(12):1200–2.Kavitha KR, Rajendran GS, Varsha J. A correlation based SVM-recursive multiple feature elimination classifier for breast cancer disease using microarray. In: 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2016. p. 2677–83.Collins GS, Mallett S, Omar O, Yu L-M. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Med. 2011;9(1):103.Aarøe J, Lindahl T, Dumeaux V, Sæbø S, Tobin D, Hagen N, et al. Gene expression profiling of peripheral blood cells for early detection of breast cancer. Breast Cancer Res. 2010;12(1):R7.Datta S. Classification of breast cancer versus normal samples from mass spectrometry profiles using linear discriminant analysis of important features selected by random forest. Stat Appl Genet Mol Biol. 2008;7(2).Šonka M, Hlaváč V, Boyle R. Image processing, analysis, and machine vision. International Student Edition; 2008.Dembélé D, Kastner P. Fold change rank ordering statistics: a new method for detecting differentially expressed genes. BMC Bioinformatics. 2014;15(1):14.Bevilacqua V, Mastronardi G, Menolascina F, Paradiso A, Tommasi S. Genetic algorithms and artificial neural networks in microarray data analysis: a distributed approach. Eng Lett. 2006;13(4).Ca DAV, Mc V. Gene expression data classification using support vector machine and mutual information-based gene selection. Proc Comput Sci. 2015;47:13–21.van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene--disease predictions. Brief Bioinform. 2018;19(4):575–92.Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc. 2002;97(457):77–87.Bhuvaneswari V, et al. Classification of microarray gene expression data by gene combinations using fuzzy logic (MGC-FL). Int J Comput Sci Eng Appl. 2012;2(4):79.Belciug S, Gorunescu F. Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection. J Biomed Inform. 2018;83:159–66.Dettling M, Bühlmann P. Boosting for tumor classification with gene expression data. Bioinformatics. 2003;19(9):1061–9.Friedman J, Hastie T, Tibshirani R, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 2000;28(2):337–407.Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39.Fix E, Hodges Jr JL. Discriminatory analysis-nonparametric discrimination: small sample performance; 1952.Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. Boca Raton, FL: CRC Press; 1984.Martín-Fernández J-A, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J. Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modelling. 2015;15(2):134–58.Pearson K. Mathematical contributions to the theory of evolution—on a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond. 1897;60(359–367):489–98.McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American Gut: an open platform for citizen science microbiome research. Msystems. 2018;3(3):e00031–18.DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72(7):5069–72.PublicationORIGINALs13040-021-00266-7.pdfs13040-021-00266-7.pdfapplication/pdf967379https://repositorio.cuc.edu.co/bitstreams/dc1cbe9f-90ca-4fd2-a3b9-514e1a01f71c/download5a322ee7bde570e0e40de066450a698eMD51CC-LICENSElicense_rdflicense_rdfapplication/rdf+xml; charset=utf-8805https://repositorio.cuc.edu.co/bitstreams/d20aed44-fb50-4393-9620-afa958ab2688/download4460e5956bc1d1639be9ae6146a50347MD52LICENSElicense.txtlicense.txttext/plain; charset=utf-83196https://repositorio.cuc.edu.co/bitstreams/f9a42f69-36b7-4a1a-9bd1-45177210e51a/downloade30e9215131d99561d40d6b0abbe9badMD53THUMBNAILs13040-021-00266-7.pdf.jpgs13040-021-00266-7.pdf.jpgimage/jpeg53925https://repositorio.cuc.edu.co/bitstreams/f0ef351a-159a-4b2f-9f99-f9965f0fb822/downloadb52d676cbcffb23945b96b531587d945MD54TEXTs13040-021-00266-7.pdf.txts13040-021-00266-7.pdf.txttext/plain53139https://repositorio.cuc.edu.co/bitstreams/786fd4b8-6812-4dd4-8225-ff0e6e057d10/downloada87442d355d8b8f7f8ed103cd05d4f33MD5511323/8511oai:repositorio.cuc.edu.co:11323/85112024-09-17 14:22:02.163http://creativecommons.org/licenses/by-nc-nd/4.0/Attribution-NonCommercial-NoDerivatives 4.0 Internationalopen.accesshttps://repositorio.cuc.edu.coRepositorio de la Universidad de la Costa CUCrepdigital@cuc.edu.coQXV0b3Jpem8gKGF1dG9yaXphbW9zKSBhIGxhIEJpYmxpb3RlY2EgZGUgbGEgSW5zdGl0dWNpw7NuIHBhcmEgcXVlIGluY2x1eWEgdW5hIGNvcGlhLCBpbmRleGUgeSBkaXZ1bGd1ZSBlbiBlbCBSZXBvc2l0b3JpbyBJbnN0aXR1Y2lvbmFsLCBsYSBvYnJhIG1lbmNpb25hZGEgY29uIGVsIGZpbiBkZSBmYWNpbGl0YXIgbG9zIHByb2Nlc29zIGRlIHZpc2liaWxpZGFkIGUgaW1wYWN0byBkZSBsYSBtaXNtYSwgY29uZm9ybWUgYSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBxdWUgbWUobm9zKSBjb3JyZXNwb25kZShuKSB5IHF1ZSBpbmNsdXllbjogbGEgcmVwcm9kdWNjacOzbiwgY29tdW5pY2FjacOzbiBww7pibGljYSwgZGlzdHJpYnVjacOzbiBhbCBww7pibGljbywgdHJhbnNmb3JtYWNpw7NuLCBkZSBjb25mb3JtaWRhZCBjb24gbGEgbm9ybWF0aXZpZGFkIHZpZ2VudGUgc29icmUgZGVyZWNob3MgZGUgYXV0b3IgeSBkZXJlY2hvcyBjb25leG9zIHJlZmVyaWRvcyBlbiBhcnQuIDIsIDEyLCAzMCAobW9kaWZpY2FkbyBwb3IgZWwgYXJ0IDUgZGUgbGEgbGV5IDE1MjAvMjAxMiksIHkgNzIgZGUgbGEgbGV5IDIzIGRlIGRlIDE5ODIsIExleSA0NCBkZSAxOTkzLCBhcnQuIDQgeSAxMSBEZWNpc2nDs24gQW5kaW5hIDM1MSBkZSAxOTkzIGFydC4gMTEsIERlY3JldG8gNDYwIGRlIDE5OTUsIENpcmN1bGFyIE5vIDA2LzIwMDIgZGUgbGEgRGlyZWNjacOzbiBOYWNpb25hbCBkZSBEZXJlY2hvcyBkZSBhdXRvciwgYXJ0LiAxNSBMZXkgMTUyMCBkZSAyMDEyLCBsYSBMZXkgMTkxNSBkZSAyMDE4IHkgZGVtw6FzIG5vcm1hcyBzb2JyZSBsYSBtYXRlcmlhLg0KDQpBbCByZXNwZWN0byBjb21vIEF1dG9yKGVzKSBtYW5pZmVzdGFtb3MgY29ub2NlciBxdWU6DQoNCi0gTGEgYXV0b3JpemFjacOzbiBlcyBkZSBjYXLDoWN0ZXIgbm8gZXhjbHVzaXZhIHkgbGltaXRhZGEsIGVzdG8gaW1wbGljYSBxdWUgbGEgbGljZW5jaWEgdGllbmUgdW5hIHZpZ2VuY2lhLCBxdWUgbm8gZXMgcGVycGV0dWEgeSBxdWUgZWwgYXV0b3IgcHVlZGUgcHVibGljYXIgbyBkaWZ1bmRpciBzdSBvYnJhIGVuIGN1YWxxdWllciBvdHJvIG1lZGlvLCBhc8OtIGNvbW8gbGxldmFyIGEgY2FibyBjdWFscXVpZXIgdGlwbyBkZSBhY2Npw7NuIHNvYnJlIGVsIGRvY3VtZW50by4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIHRlbmRyw6EgdW5hIHZpZ2VuY2lhIGRlIGNpbmNvIGHDsW9zIGEgcGFydGlyIGRlbCBtb21lbnRvIGRlIGxhIGluY2x1c2nDs24gZGUgbGEgb2JyYSBlbiBlbCByZXBvc2l0b3JpbywgcHJvcnJvZ2FibGUgaW5kZWZpbmlkYW1lbnRlIHBvciBlbCB0aWVtcG8gZGUgZHVyYWNpw7NuIGRlIGxvcyBkZXJlY2hvcyBwYXRyaW1vbmlhbGVzIGRlbCBhdXRvciB5IHBvZHLDoSBkYXJzZSBwb3IgdGVybWluYWRhIHVuYSB2ZXogZWwgYXV0b3IgbG8gbWFuaWZpZXN0ZSBwb3IgZXNjcml0byBhIGxhIGluc3RpdHVjacOzbiwgY29uIGxhIHNhbHZlZGFkIGRlIHF1ZSBsYSBvYnJhIGVzIGRpZnVuZGlkYSBnbG9iYWxtZW50ZSB5IGNvc2VjaGFkYSBwb3IgZGlmZXJlbnRlcyBidXNjYWRvcmVzIHkvbyByZXBvc2l0b3Jpb3MgZW4gSW50ZXJuZXQgbG8gcXVlIG5vIGdhcmFudGl6YSBxdWUgbGEgb2JyYSBwdWVkYSBzZXIgcmV0aXJhZGEgZGUgbWFuZXJhIGlubWVkaWF0YSBkZSBvdHJvcyBzaXN0ZW1hcyBkZSBpbmZvcm1hY2nDs24gZW4gbG9zIHF1ZSBzZSBoYXlhIGluZGV4YWRvLCBkaWZlcmVudGVzIGFsIHJlcG9zaXRvcmlvIGluc3RpdHVjaW9uYWwgZGUgbGEgSW5zdGl0dWNpw7NuLCBkZSBtYW5lcmEgcXVlIGVsIGF1dG9yKHJlcykgdGVuZHLDoW4gcXVlIHNvbGljaXRhciBsYSByZXRpcmFkYSBkZSBzdSBvYnJhIGRpcmVjdGFtZW50ZSBhIG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBkaXN0aW50b3MgYWwgZGUgbGEgSW5zdGl0dWNpw7NuIHNpIGRlc2VhIHF1ZSBzdSBvYnJhIHNlYSByZXRpcmFkYSBkZSBpbm1lZGlhdG8uDQoNCi0gTGEgYXV0b3JpemFjacOzbiBkZSBwdWJsaWNhY2nDs24gY29tcHJlbmRlIGVsIGZvcm1hdG8gb3JpZ2luYWwgZGUgbGEgb2JyYSB5IHRvZG9zIGxvcyBkZW3DoXMgcXVlIHNlIHJlcXVpZXJhIHBhcmEgc3UgcHVibGljYWNpw7NuIGVuIGVsIHJlcG9zaXRvcmlvLiBJZ3VhbG1lbnRlLCBsYSBhdXRvcml6YWNpw7NuIHBlcm1pdGUgYSBsYSBpbnN0aXR1Y2nDs24gZWwgY2FtYmlvIGRlIHNvcG9ydGUgZGUgbGEgb2JyYSBjb24gZmluZXMgZGUgcHJlc2VydmFjacOzbiAoaW1wcmVzbywgZWxlY3Ryw7NuaWNvLCBkaWdpdGFsLCBJbnRlcm5ldCwgaW50cmFuZXQsIG8gY3VhbHF1aWVyIG90cm8gZm9ybWF0byBjb25vY2lkbyBvIHBvciBjb25vY2VyKS4NCg0KLSBMYSBhdXRvcml6YWNpw7NuIGVzIGdyYXR1aXRhIHkgc2UgcmVudW5jaWEgYSByZWNpYmlyIGN1YWxxdWllciByZW11bmVyYWNpw7NuIHBvciBsb3MgdXNvcyBkZSBsYSBvYnJhLCBkZSBhY3VlcmRvIGNvbiBsYSBsaWNlbmNpYSBlc3RhYmxlY2lkYSBlbiBlc3RhIGF1dG9yaXphY2nDs24uDQoNCi0gQWwgZmlybWFyIGVzdGEgYXV0b3JpemFjacOzbiwgc2UgbWFuaWZpZXN0YSBxdWUgbGEgb2JyYSBlcyBvcmlnaW5hbCB5IG5vIGV4aXN0ZSBlbiBlbGxhIG5pbmd1bmEgdmlvbGFjacOzbiBhIGxvcyBkZXJlY2hvcyBkZSBhdXRvciBkZSB0ZXJjZXJvcy4gRW4gY2FzbyBkZSBxdWUgZWwgdHJhYmFqbyBoYXlhIHNpZG8gZmluYW5jaWFkbyBwb3IgdGVyY2Vyb3MgZWwgbyBsb3MgYXV0b3JlcyBhc3VtZW4gbGEgcmVzcG9uc2FiaWxpZGFkIGRlbCBjdW1wbGltaWVudG8gZGUgbG9zIGFjdWVyZG9zIGVzdGFibGVjaWRvcyBzb2JyZSBsb3MgZGVyZWNob3MgcGF0cmltb25pYWxlcyBkZSBsYSBvYnJhIGNvbiBkaWNobyB0ZXJjZXJvLg0KDQotIEZyZW50ZSBhIGN1YWxxdWllciByZWNsYW1hY2nDs24gcG9yIHRlcmNlcm9zLCBlbCBvIGxvcyBhdXRvcmVzIHNlcsOhbiByZXNwb25zYWJsZXMsIGVuIG5pbmfDum4gY2FzbyBsYSByZXNwb25zYWJpbGlkYWQgc2Vyw6EgYXN1bWlkYSBwb3IgbGEgaW5zdGl0dWNpw7NuLg0KDQotIENvbiBsYSBhdXRvcml6YWNpw7NuLCBsYSBpbnN0aXR1Y2nDs24gcHVlZGUgZGlmdW5kaXIgbGEgb2JyYSBlbiDDrW5kaWNlcywgYnVzY2Fkb3JlcyB5IG90cm9zIHNpc3RlbWFzIGRlIGluZm9ybWFjacOzbiBxdWUgZmF2b3JlemNhbiBzdSB2aXNpYmlsaWRhZA==