MLExplore.js : exploring high-dimensional data by interacting and interpreting t-SNE and K-Means
In Exploratory Data Analysis (EDA), Machine Learning (ML) is an alternative for under-standing larger and high-dimensional data. Dimensionality Reduction (DR) algorithms suchas t-SNE produce two or three dimensional embeddings looking to preserve local and globalstructure of data. By the other hand,...
- Autores:
-
Peña Lozano, Fabián Camilo
- Tipo de recurso:
- Fecha de publicación:
- 2019
- Institución:
- Universidad de los Andes
- Repositorio:
- Séneca: repositorio Uniandes
- Idioma:
- eng
- OAI Identifier:
- oai:repositorio.uniandes.edu.co:1992/44047
- Acceso en línea:
- http://hdl.handle.net/1992/44047
- Palabra clave:
- Visualización de la información - Investigaciones - Estudio de casos
Aprendizaje automático (Inteligencia artificial) - Investigaciones - Estudio de casos
Analítica visual - Investigaciones - Estudio de casos
Ingeniería
- Rights
- openAccess
- License
- http://creativecommons.org/licenses/by-nc-nd/4.0/
Summary: | In Exploratory Data Analysis (EDA), Machine Learning (ML) is an alternative for under-standing larger and high-dimensional data. Dimensionality Reduction (DR) algorithms suchas t-SNE produce two or three dimensional embeddings looking to preserve local and globalstructure of data. By the other hand, Clustering algorithms such as K-Means seek to achievea similar goal by producing a cluster membership for each data instance. In general terms,when using these kind of algorithms, non-expert ML users can derive wrong conclusions ifan appropriate set of hyper-parameters for fitting the algorithm is not selected. Similarly,groups of attributes and data instances could represent, for instance, high-levels of noise inthe data significantly affecting the embedding and clustering formation. To address this, ML-Explore.js, a web-based tool for exploring high-dimensional tabular data that implements thet-SNE and K-Means algorithms running in the browser is presented. Because this tool is tar-geted to domain-expert users, some concepts and recommendations for designing user-centricML systems are derived from the Interactive ML and Interpretable ML sub-fields. Like someother ML-based EDA tools, MLExplore.js allows users to explore the hyper-parameter spacewhile interactively seeing how these changes affect the model results. In addition, the abilityto evidence model changes when user perform attribute selection and data navigation is alsoincluded. This enables domain-expert users to perform cluster-oriented DR task sequencessuch as verify clusters, name clusters and match cluster and classes. To demonstrate its usage,one case study of exploring a real-world dataset is presented. |
---|