Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings
Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. On...
- Autores:
-
Ariza-Jimenez L.
Quintero O.L.
Pinel N.
- Tipo de recurso:
- Fecha de publicación:
- 2018
- Institución:
- Universidad EAFIT
- Repositorio:
- Repositorio EAFIT
- Idioma:
- eng
- OAI Identifier:
- oai:repository.eafit.edu.co:10784/26744
- Acceso en línea:
- https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480
http://hdl.handle.net/10784/26744
- Palabra clave:
- algorithm
cluster
analysis
DNA
sequence
genomics
metagenome
metagenomics
Algorithms
Cluster
Analysis
Genomics
Metagenome
Metagenomics
Sequence
Analysis,
DNA
- Rights
- License
- Institute of Electrical and Electronics Engineers Inc.
id |
REPOEAFIT2_c4d0960df45eeff8e557cbd714e6d25f |
---|---|
oai_identifier_str |
oai:repository.eafit.edu.co:10784/26744 |
network_acronym_str |
REPOEAFIT2 |
network_name_str |
Repositorio EAFIT |
repository_id_str |
|
spelling |
2021-03-23T19:52:09Z2018-01-012021-03-23T19:52:09Zhttps://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480058910191557170XPUBMED;30440633SCOPUS;2-s2.0-85056638520http://hdl.handle.net/10784/2674410.1109/EMBC.2018.8512529Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE.engInstitute of Electrical and Electronics Engineers Inc.https://www.scopus.com/inward/record.uri?eid=2-s2.0-85056638520&doi=10.1109%2fEMBC.2018.8512529&partnerID=40&md5=9e7b2d279646efeef082154f24b16240Institute of Electrical and Electronics Engineers Inc.Acceso restringidohttp://purl.org/coar/access_right/c_16ecIEEE Engineering in Medicine and Biology Society Conference ProceedingsalgorithmclusteranalysisDNAsequencegenomicsmetagenomemetagenomicsAlgorithmsClusterAnalysisGenomicsMetagenomeMetagenomicsSequenceAnalysis,DNAUnsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor EmbeddingspublishedVersioninfo:eu-repo/semantics/publishedVersionarticleinfo:eu-repo/semantics/articleArtículohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_6501http://purl.org/coar/resource_type/c_2df8fbb1Universidad EAFIT. Departamento de CienciasAriza-Jimenez L.Quintero O.L.Pinel N.Biodiversidad, Evolución y ConservaciónIEEE Engineering in Medicine and Biology Society Conference Proceedings10784/26744oai:repository.eafit.edu.co:10784/267442022-04-27 15:49:58.989metadata.onlyhttps://repository.eafit.edu.coRepositorio Institucional Universidad EAFITrepositorio@eafit.edu.co |
dc.title.eng.fl_str_mv |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
title |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
spellingShingle |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings algorithm cluster analysis DNA sequence genomics metagenome metagenomics Algorithms Cluster Analysis Genomics Metagenome Metagenomics Sequence Analysis, DNA |
title_short |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
title_full |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
title_fullStr |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
title_full_unstemmed |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
title_sort |
Unsupervised fuzzy binning of metagenomic sequence fragments on three-dimensional Barnes-Hut t-Stochastic Neighbor Embeddings |
dc.creator.fl_str_mv |
Ariza-Jimenez L. Quintero O.L. Pinel N. |
dc.contributor.department.spa.fl_str_mv |
Universidad EAFIT. Departamento de Ciencias |
dc.contributor.author.none.fl_str_mv |
Ariza-Jimenez L. Quintero O.L. Pinel N. |
dc.contributor.researchgroup.spa.fl_str_mv |
Biodiversidad, Evolución y Conservación |
dc.subject.eng.fl_str_mv |
algorithm cluster analysis DNA sequence genomics metagenome metagenomics Algorithms Cluster Analysis Genomics Metagenome Metagenomics Sequence Analysis, DNA |
topic |
algorithm cluster analysis DNA sequence genomics metagenome metagenomics Algorithms Cluster Analysis Genomics Metagenome Metagenomics Sequence Analysis, DNA |
description |
Shotgun metagenomic studies attempt to reconstruct population genome sequences from complex microbial communities. In some traditional genome demarcation approaches, high-dimensional sequence data are embedded into two-dimensional spaces and subsequently binned into candidate genomic populations. One such approach uses a combination of the Barnes-Hut approximation and the t -Stochastic Neighbor Embedding (BH-SNE) algorithm for dimensionality reduction of DNA sequence data pentamer profiles; and demarcation of groups based on Gaussian mixture models within humanimposed boundaries. We found that genome demarcation from three-dimensional BH-SNE embeddings consistently results in more accurate binnings than 2-D embeddings. We further addressed the lack of a priori population number information by developing an unsupervised binning approach based on the Subtractive and Fuzzy c-means (FCM) clustering algorithms combined with internal clustering validity indices. Lastly, we addressed the subject of shared membership of individual data objects in a mixed community by assigning a degree of membership to individual objects using the FCM algorithm, and discriminated between confidently binned and uncertain sequence data objects from the community for subsequent biological interpretation. The binning of metagenome sequence fragments according to thresholds in the degree of membership opens the door for the identification of horizontally transferred elements and other genomic regions of uncertain assignment in which biologically meaningful information resides. The reported approach improves the unsupervised genome demarcation of populations within complex communities, increases the confidence in the coherence of the binned elements, and enables the identification of evolutionary processes ignored in hard-binning approaches in shotgun metagenomic studies. © 2018 IEEE. |
publishDate |
2018 |
dc.date.issued.none.fl_str_mv |
2018-01-01 |
dc.date.available.none.fl_str_mv |
2021-03-23T19:52:09Z |
dc.date.accessioned.none.fl_str_mv |
2021-03-23T19:52:09Z |
dc.type.eng.fl_str_mv |
publishedVersion info:eu-repo/semantics/publishedVersion article info:eu-repo/semantics/article |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 http://purl.org/coar/resource_type/c_2df8fbb1 |
dc.type.local.spa.fl_str_mv |
Artículo |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480 |
dc.identifier.issn.none.fl_str_mv |
05891019 1557170X |
dc.identifier.other.none.fl_str_mv |
PUBMED;30440633 SCOPUS;2-s2.0-85056638520 |
dc.identifier.uri.none.fl_str_mv |
http://hdl.handle.net/10784/26744 |
dc.identifier.doi.none.fl_str_mv |
10.1109/EMBC.2018.8512529 |
url |
https://eafit.fundanetsuite.com/Publicaciones/ProdCientif/PublicacionFrw.aspx?id=8480 http://hdl.handle.net/10784/26744 |
identifier_str_mv |
05891019 1557170X PUBMED;30440633 SCOPUS;2-s2.0-85056638520 10.1109/EMBC.2018.8512529 |
dc.language.iso.eng.fl_str_mv |
eng |
language |
eng |
dc.relation.uri.none.fl_str_mv |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85056638520&doi=10.1109%2fEMBC.2018.8512529&partnerID=40&md5=9e7b2d279646efeef082154f24b16240 |
dc.rights.none.fl_str_mv |
Institute of Electrical and Electronics Engineers Inc. |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_16ec |
dc.rights.local.spa.fl_str_mv |
Acceso restringido |
rights_invalid_str_mv |
Institute of Electrical and Electronics Engineers Inc. Acceso restringido http://purl.org/coar/access_right/c_16ec |
dc.publisher.none.fl_str_mv |
Institute of Electrical and Electronics Engineers Inc. |
publisher.none.fl_str_mv |
Institute of Electrical and Electronics Engineers Inc. |
dc.source.none.fl_str_mv |
IEEE Engineering in Medicine and Biology Society Conference Proceedings |
institution |
Universidad EAFIT |
repository.name.fl_str_mv |
Repositorio Institucional Universidad EAFIT |
repository.mail.fl_str_mv |
repositorio@eafit.edu.co |
_version_ |
1814110530598600704 |