Avoiding speaker variability in pronunciation verification of children's disordered speech

This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test...

Full description

Autores:
Tipo de recurso:
Fecha de publicación:
2009
Institución:
Universidad del Rosario
Repositorio:
Repositorio EdocUR - U. Rosario
Idioma:
eng
OAI Identifier:
oai:repository.urosario.edu.co:10336/28304
Acceso en línea:
https://doi.org/10.1145/1640377.1640388
https://repository.urosario.edu.co/handle/10336/28304
Palabra clave:
Pronunciation evaluation
Children speech
Speech disorders
Rights
License
Restringido (Acceso a grupos específicos)
id EDOCUR2_f8e10745fabc680583da4059f137e9c1
oai_identifier_str oai:repository.urosario.edu.co:10336/28304
network_acronym_str EDOCUR2
network_name_str Repositorio EdocUR - U. Rosario
repository_id_str
spelling 8a33594b-a7e0-416c-b950-82fb57cfb1160bb16363-d9bc-486b-a1a7-39ef38ebe340798055896002020-08-28T15:47:55Z2020-08-28T15:47:55Z2009-11This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel Nbest based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed.application/pdfhttps://doi.org/10.1145/1640377.1640388ISBN: 978-1-60558-690-8https://repository.urosario.edu.co/handle/10336/28304engAssociation for Computing MachineryWOCCI `09: Proceedings of the 2nd Workshop on Child, Computer and Interaction;CMI-MLMI `09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and InteractionCMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts, ISBN: 978-1-60558-690-89 (2009); pp. 1-5https://dl.acm.org/doi/10.1145/1640377.1640388Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecWOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and InteractionCMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURPronunciation evaluationChildren speechSpeech disordersAvoiding speaker variability in pronunciation verification of children's disordered speechEvitar la variabilidad del hablante en la verificación de la pronunciación del habla desordenada de los niñosarticleArtículohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_6501Saz, OscarLleida, EduardoRodríguez-Dueñas, William R.10336/28304oai:repository.urosario.edu.co:10336/283042021-10-15 06:06:16.81https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co
dc.title.spa.fl_str_mv Avoiding speaker variability in pronunciation verification of children's disordered speech
dc.title.TranslatedTitle.spa.fl_str_mv Evitar la variabilidad del hablante en la verificación de la pronunciación del habla desordenada de los niños
title Avoiding speaker variability in pronunciation verification of children's disordered speech
spellingShingle Avoiding speaker variability in pronunciation verification of children's disordered speech
Pronunciation evaluation
Children speech
Speech disorders
title_short Avoiding speaker variability in pronunciation verification of children's disordered speech
title_full Avoiding speaker variability in pronunciation verification of children's disordered speech
title_fullStr Avoiding speaker variability in pronunciation verification of children's disordered speech
title_full_unstemmed Avoiding speaker variability in pronunciation verification of children's disordered speech
title_sort Avoiding speaker variability in pronunciation verification of children's disordered speech
dc.subject.keyword.spa.fl_str_mv Pronunciation evaluation
Children speech
Speech disorders
topic Pronunciation evaluation
Children speech
Speech disorders
description This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel Nbest based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed.
publishDate 2009
dc.date.created.spa.fl_str_mv 2009-11
dc.date.accessioned.none.fl_str_mv 2020-08-28T15:47:55Z
dc.date.available.none.fl_str_mv 2020-08-28T15:47:55Z
dc.type.eng.fl_str_mv article
dc.type.coarversion.fl_str_mv http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.coar.fl_str_mv http://purl.org/coar/resource_type/c_6501
dc.type.spa.spa.fl_str_mv Artículo
dc.identifier.doi.none.fl_str_mv https://doi.org/10.1145/1640377.1640388
dc.identifier.issn.none.fl_str_mv ISBN: 978-1-60558-690-8
dc.identifier.uri.none.fl_str_mv https://repository.urosario.edu.co/handle/10336/28304
url https://doi.org/10.1145/1640377.1640388
https://repository.urosario.edu.co/handle/10336/28304
identifier_str_mv ISBN: 978-1-60558-690-8
dc.language.iso.spa.fl_str_mv eng
language eng
dc.relation.citationTitle.none.fl_str_mv WOCCI `09: Proceedings of the 2nd Workshop on Child, Computer and Interaction;CMI-MLMI `09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)
dc.relation.ispartof.spa.fl_str_mv WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and Interaction
CMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts, ISBN: 978-1-60558-690-89 (2009); pp. 1-5
dc.relation.uri.spa.fl_str_mv https://dl.acm.org/doi/10.1145/1640377.1640388
dc.rights.coar.fl_str_mv http://purl.org/coar/access_right/c_16ec
dc.rights.acceso.spa.fl_str_mv Restringido (Acceso a grupos específicos)
rights_invalid_str_mv Restringido (Acceso a grupos específicos)
http://purl.org/coar/access_right/c_16ec
dc.format.mimetype.none.fl_str_mv application/pdf
dc.publisher.spa.fl_str_mv Association for Computing Machinery
dc.source.spa.fl_str_mv WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and Interaction
CMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)
institution Universidad del Rosario
dc.source.instname.none.fl_str_mv instname:Universidad del Rosario
dc.source.reponame.none.fl_str_mv reponame:Repositorio Institucional EdocUR
repository.name.fl_str_mv Repositorio institucional EdocUR
repository.mail.fl_str_mv edocur@urosario.edu.co
_version_ 1814167521715027968