MLExplore.js : exploring high-dimensional data by interacting and interpreting t-SNE and K-Means

In Exploratory Data Analysis (EDA), Machine Learning (ML) is an alternative for under-standing larger and high-dimensional data. Dimensionality Reduction (DR) algorithms suchas t-SNE produce two or three dimensional embeddings looking to preserve local and globalstructure of data. By the other hand,...

Full description

Autores:: Peña Lozano, Fabián Camilo

Tipo de recurso:

Fecha de publicación:: 2019

Institución:: Universidad de los Andes

Repositorio:: Séneca: repositorio Uniandes

Idioma:: eng

Description
Summary:	In Exploratory Data Analysis (EDA), Machine Learning (ML) is an alternative for under-standing larger and high-dimensional data. Dimensionality Reduction (DR) algorithms suchas t-SNE produce two or three dimensional embeddings looking to preserve local and globalstructure of data. By the other hand, Clustering algorithms such as K-Means seek to achievea similar goal by producing a cluster membership for each data instance. In general terms,when using these kind of algorithms, non-expert ML users can derive wrong conclusions ifan appropriate set of hyper-parameters for fitting the algorithm is not selected. Similarly,groups of attributes and data instances could represent, for instance, high-levels of noise inthe data significantly affecting the embedding and clustering formation. To address this, ML-Explore.js, a web-based tool for exploring high-dimensional tabular data that implements thet-SNE and K-Means algorithms running in the browser is presented. Because this tool is tar-geted to domain-expert users, some concepts and recommendations for designing user-centricML systems are derived from the Interactive ML and Interpretable ML sub-fields. Like someother ML-based EDA tools, MLExplore.js allows users to explore the hyper-parameter spacewhile interactively seeing how these changes affect the model results. In addition, the abilityto evidence model changes when user perform attribute selection and data navigation is alsoincluded. This enables domain-expert users to perform cluster-oriented DR task sequencessuch as verify clusters, name clusters and match cluster and classes. To demonstrate its usage,one case study of exploring a real-world dataset is presented.

MLExplore.js : exploring high-dimensional data by interacting and interpreting t-SNE and K-Means

Publicaciones similares