Interpretable Deep Embeddings for Single-cell RNA-seq Clustering Analysis via Gene Attention
This work presents a ZINB model-based denoising autoencoder that offers interpretable deep embeddings through a gene attention mechanism for single-cell RNA-seq clustering. Our method performs a dimensionality reduction into a latent space that embeds semantic information from gene expression inputs...
- Autores:
-
Forigua Díaz, Cristhian David
- Tipo de recurso:
- Trabajo de grado de pregrado
- Fecha de publicación:
- 2022
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/63770
- Acceso en línea:
- http://hdl.handle.net/1992/63770
- Palabra clave:
- scRNA-seq
Autoencoder
Gene attention
Interpretability
Clustering
Biología
Ingeniería
- Rights
- openAccess
- License
- Atribución-CompartirIgual 4.0 Internacional
Summary: | This work presents a ZINB model-based denoising autoencoder that offers interpretable deep embeddings through a gene attention mechanism for single-cell RNA-seq clustering. Our method performs a dimensionality reduction into a latent space that embeds semantic information from gene expression inputs and uses the latent representations for further clustering into cell groups. Our gene attention mechanism offers a sense of interpretability to how the autoencoder is embedding the gene expression data and offers the possibility to perform gene analysis for clustering. We perform extensive ablation experiments on the configuration of the autoencoder configuration and the attention mechanism. We test our method on six scRNA-seq datasets with different cell types. The results indicate that our method is competitive compared to previous approaches. In particular, it outperforms previous methods on the 10XPBMC and Worm Neuron Cells datasets. Functional enrichment analysis of genes highlighted by attention vectors offers interpretability on how the network processes the gene expression data. The gene analysis shows a correspondence between what the network learns and the cell types in the datasets. |
---|