Dual-Layer anti-spoofing and voice tone recognition system for deepfake audio in english

This research focuses on the development of a personalized voice spoofing detection model, specifically targeting audio generated through deepfake techniques, such as text-to-speech (TTS) and voice conversion (VC). The growing sophistication of generative artificial intelligence models has facilitat...

Full description

Autores:
Daza Díaz, Paula Cecilia
Torres Ramírez, Sofia
Tipo de recurso:
Trabajo de grado de pregrado
Fecha de publicación:
2024
Institución:
Universidad de los Andes
Repositorio:
Séneca: repositorio Uniandes
Idioma:
eng
OAI Identifier:
oai:repositorio.uniandes.edu.co:1992/75424
Acceso en línea:
https://hdl.handle.net/1992/75424
Palabra clave:
Voice Spoofing Detection
Personalized Voice Recognition
Vocal Tone Identification
Voice conversion
text-to-speech
Ingeniería
Rights
embargoedAccess
License
Attribution-NonCommercial-NoDerivatives 4.0 International
Description
Summary:This research focuses on the development of a personalized voice spoofing detection model, specifically targeting audio generated through deepfake techniques, such as text-to-speech (TTS) and voice conversion (VC). The growing sophistication of generative artificial intelligence models has facilitated the creation of falsified audio that is nearly indistinguishable to the human ear, posing a significant risk to individual security and privacy. In response to this threat, our project proposes an innovative solution: a dual-layer voice spoofing detection system. The first layer focuses on detecting whether the audio is real or spoofed, while the second layer identifies whether the voice tone belongs to the legitimate user or someone else, adding an extra level of personalization and security. This system is built upon an existing pre-trained voice spoofing detection model that undergoes fine-tuning with user-specific data for the first layer, which focuses on detecting whether the audio is real or spoofed. For the second layer, the same model is repurposed, but instead of training it for spoof detection, it is trained to recognize the user's specific vocal tone using real voice samples from multiple individuals with different vocal tones. To facilitate the adoption and use of this technology, we have developed a user-friendly tool. This tool allows users to provide just a few minutes of their voice recordings, after which it automatically generates deepfake audio and analyzes voice tones using the mentioned techniques. These falsified audios, along with the genuine recordings, are used to train the detection models, adapting them specifically to the user's voice and tone. This dual-layered approach offers a robust and personalized solution to protect the user's vocal identity against deepfake spoofing threats, not only by verifying the authenticity of the audio but also by ensuring that the tone of voice truly belongs to the legitimate user.