Representación de las propiedades tímbricas y dinámicas de una señal de audio a través de una matriz RGB para la sustitución sensorial del oído a la vista

This thesis shows the development of a method for the visual representation of the timbral and dynamic properties of an audio signal in an RGB matrix, for ear-to-vision sensory substitution in people with hearing disabilities. In the first part, were obtained audio descriptors and a comparison was m...

Full description

Autores:: García Gómez, Andrés

Tipo de recurso:

Fecha de publicación:: 2021

Institución:: Universidad de San Buenaventura

Repositorio:: Repositorio USB

Idioma:: spa

Description
Summary:	This thesis shows the development of a method for the visual representation of the timbral and dynamic properties of an audio signal in an RGB matrix, for ear-to-vision sensory substitution in people with hearing disabilities. In the first part, were obtained audio descriptors and a comparison was made between the different types, such as the MFCC, spectral descriptors such as the spectral centroid, spectral flatness, spectral slope among others, descriptors such as the "Chroma Vector" were also used. Which allows identifying musical notes. To obtain these descriptors, the audio characteristics extraction tools included in the Matlab 2020B Audio Toolbox and the Dan Ellis “Chroma Feature Analysis and Synthesis” library were used. Based on the "Chroma Vector" an experimental algorithm was developed, using type IIR filters and later improved with the Q transform. Based on this method, were achieved visual representations in an RGB matrix. These visual representations were inspired by the phenomenon of synesthesia, more precisely sound-color synesthesia (Chromesthesia) and the analogous characteristics between hearing and vision. Visual representations based on this algorithm were obtained, similar to a spectrogram in real time, using the chromatic scale and colors according to this. Such visualizations appear to be useful to observe the temporal evolution of harmony and notes in music and for identifying basic patterns in short voice signals (logatomes). Finally, a subjective visual comparison between similar voice and music signals visualizations was made. a comparison using the “VGGish” neural network was made too, for this using its original input (mel scale filters), and the constant Q spectrum that was obtained as an audio descriptor previously, in this case showing the original input a better performance, but qualitatively it seems that the method chosen in this work visually represents the music and voice signals better

Representación de las propiedades tímbricas y dinámicas de una señal de audio a través de una matriz RGB para la sustitución sensorial del oído a la vista

Publicaciones similares