Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings

Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. On...

Full description

Autores:
Ariza-Jimenez L.
Quintero O.L.
Pinel N.
Tipo de recurso:
Fecha de publicación:
2018
Institución:
Universidad EAFIT
Repositorio:
Repositorio EAFIT
Idioma:
eng
OAI Identifier:
oai:repository.eafit.edu.co:10784/26744
Acceso en línea:
https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480
http://hdl.handle.net/10784/26744
Palabra clave:
algorithm
cluster
analysis
DNA
sequence
genomics
metagenome
metagenomics
Algorithms
Cluster
Analysis
Genomics
Metagenome
Metagenomics
Sequence
Analysis,
DNA
Rights
License
Institute of Electrical and Electronics Engineers Inc.
id REPOEAFIT2_c4d0960df45eeff8e557cbd714e6d25f
oai_identifier_str oai:repository.eafit.edu.co:10784/26744
network_acronym_str REPOEAFIT2
network_name_str Repositorio EAFIT
repository_id_str
spelling 2021-03-23T19:52:09Z2018-01-012021-03-23T19:52:09Zhttps://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480058910191557170XPUBMED;30440633SCOPUS;2-s2.0-85056638520http://hdl.handle.net/10784/2674410.1109/EMBC.2018.8512529Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.engInstitute of Electrical and Electronics Engineers Inc.https://www.scopus.com/inward/record.uri?eid=2-s2.0-85056638520&doi=10.1109%2fEMBC.2018.8512529&partnerID=40&md5=9e7b2d279646efeef082154f24b16240Institute of Electrical and Electronics Engineers Inc.Acceso restringidohttp://purl.org/coar/access_right/c_16ecIEEE Engineering in Medicine and Biology Society Conference ProceedingsalgorithmclusteranalysisDNAsequencegenomicsmetagenomemetagenomicsAlgorithmsClusterAnalysisGenomicsMetagenomeMetagenomicsSequenceAnalysis,DNAUnsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor EmbeddingspublishedVersioninfo:eu-repo/semantics/publishedVersionarticleinfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Universidad EAFIT. Departamento de CienciasAriza-Jimenez L.Quintero O.L.Pinel N.Biodiversidad, Evolución y ConservaciónIEEE Engineering in Medicine and Biology Society Conference Proceedings10784/26744oai:repository.eafit.edu.co:10784/267442022-04-27 15:49:58.989metadata.onlyhttps://repository.eafit.edu.coRepositorio Institucional Universidad EAFITrepositorio@eafit.edu.co
dc.title.eng.fl_str_mv Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
title Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
spellingShingle Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
algorithm
cluster
analysis
DNA
sequence
genomics
metagenome
metagenomics
Algorithms
Cluster
Analysis
Genomics
Metagenome
Metagenomics
Sequence
Analysis,
DNA
title_short Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
title_full Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
title_fullStr Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
title_full_unstemmed Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
title_sort Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
dc.creator.fl_str_mv Ariza-Jimenez L.
Quintero O.L.
Pinel N.
dc.contributor.department.spa.fl_str_mv Universidad EAFIT. Departamento de Ciencias
dc.contributor.author.none.fl_str_mv Ariza-Jimenez L.
Quintero O.L.
Pinel N.
dc.contributor.researchgroup.spa.fl_str_mv Biodiversidad, Evolución y Conservación
dc.subject.eng.fl_str_mv algorithm
cluster
analysis
DNA
sequence
genomics
metagenome
metagenomics
Algorithms
Cluster
Analysis
Genomics
Metagenome
Metagenomics
Sequence
Analysis,
DNA
topic algorithm
cluster
analysis
DNA
sequence
genomics
metagenome
metagenomics
Algorithms
Cluster
Analysis
Genomics
Metagenome
Metagenomics
Sequence
Analysis,
DNA
description Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.
publishDate 2018
dc.date.issued.none.fl_str_mv 2018-01-01
dc.date.available.none.fl_str_mv 2021-03-23T19:52:09Z
dc.date.accessioned.none.fl_str_mv 2021-03-23T19:52:09Z
dc.type.eng.fl_str_mv publishedVersion
info:eu-repo/semantics/publishedVersion
article
info:eu-repo/semantics/article
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_6501
http://purl.org/coar/resource_type/c_2df8fbb1
dc.type.local.spa.fl_str_mv Artículo
status_str publishedVersion
dc.identifier.none.fl_str_mv https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480
dc.identifier.issn.none.fl_str_mv 05891019
1557170X
dc.identifier.other.none.fl_str_mv PUBMED;30440633
SCOPUS;2-s2.0-85056638520
dc.identifier.uri.none.fl_str_mv http://hdl.handle.net/10784/26744
dc.identifier.doi.none.fl_str_mv 10.1109/EMBC.2018.8512529
url https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480
http://hdl.handle.net/10784/26744
identifier_str_mv 05891019
1557170X
PUBMED;30440633
SCOPUS;2-s2.0-85056638520
10.1109/EMBC.2018.8512529
dc.language.iso.eng.fl_str_mv eng
language eng
dc.relation.uri.none.fl_str_mv https://www.scopus.com/inward/record.uri?eid=2-s2.0-85056638520&doi=10.1109%2fEMBC.2018.8512529&partnerID=40&md5=9e7b2d279646efeef082154f24b16240
dc.rights.none.fl_str_mv Institute of Electrical and Electronics Engineers Inc.
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.local.spa.fl_str_mv Acceso restringido
rights_invalid_str_mv Institute of Electrical and Electronics Engineers Inc.
Acceso restringido
http://purl.org/coar/access_right/c_16ec
dc.publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers Inc.
publisher.none.fl_str_mv Institute of Electrical and Electronics Engineers Inc.
dc.source.none.fl_str_mv IEEE Engineering in Medicine and Biology Society Conference Proceedings
institution Universidad EAFIT
repository.name.fl_str_mv Repositorio Institucional Universidad EAFIT
repository.mail.fl_str_mv repositorio@eafit.edu.co
_version_ 1814110530598600704