Avoiding speaker variability in pronunciation verification of children's disordered speech
This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test...
- Autores:
- Tipo de recurso:
- Fecha de publicación:
- 2009
- Institución:
- Universidad del Rosario
- Repositorio:
- Repositorio EdocUR - U. Rosario
- Idioma:
- eng
- OAI Identifier:
- oai:repository.urosario.edu.co:10336/28304
- Acceso en línea:
- https://doi.org/10.1145/1640377.1640388
https://repository.urosario.edu.co/handle/10336/28304
- Palabra clave:
- Pronunciation evaluation
Children speech
Speech disorders
- Rights
- License
- Restringido (Acceso a grupos específicos)
id |
EDOCUR2_f8e10745fabc680583da4059f137e9c1 |
---|---|
oai_identifier_str |
oai:repository.urosario.edu.co:10336/28304 |
network_acronym_str |
EDOCUR2 |
network_name_str |
Repositorio EdocUR - U. Rosario |
repository_id_str |
|
spelling |
8a33594b-a7e0-416c-b950-82fb57cfb1160bb16363-d9bc-486b-a1a7-39ef38ebe340798055896002020-08-28T15:47:55Z2020-08-28T15:47:55Z2009-11This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel Nbest based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed.application/pdfhttps://doi.org/10.1145/1640377.1640388ISBN: 978-1-60558-690-8https://repository.urosario.edu.co/handle/10336/28304engAssociation for Computing MachineryWOCCI `09: Proceedings of the 2nd Workshop on Child, Computer and Interaction;CMI-MLMI `09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and InteractionCMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts, ISBN: 978-1-60558-690-89 (2009); pp. 1-5https://dl.acm.org/doi/10.1145/1640377.1640388Restringido (Acceso a grupos específicos)http://purl.org/coar/access_right/c_16ecWOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and InteractionCMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009)instname:Universidad del Rosarioreponame:Repositorio Institucional EdocURPronunciation evaluationChildren speechSpeech disordersAvoiding speaker variability in pronunciation verification of children's disordered speechEvitar la variabilidad del hablante en la verificación de la pronunciación del habla desordenada de los niñosarticleArtículohttp://purl.org/coar/version/c_970fb48d4fbd8a85http://purl.org/coar/resource_type/c_6501Saz, OscarLleida, EduardoRodríguez-Dueñas, William R.10336/28304oai:repository.urosario.edu.co:10336/283042021-10-15 06:06:16.81https://repository.urosario.edu.coRepositorio institucional EdocURedocur@urosario.edu.co |
dc.title.spa.fl_str_mv |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
dc.title.TranslatedTitle.spa.fl_str_mv |
Evitar la variabilidad del hablante en la verificación de la pronunciación del habla desordenada de los niños |
title |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
spellingShingle |
Avoiding speaker variability in pronunciation verification of children's disordered speech Pronunciation evaluation Children speech Speech disorders |
title_short |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
title_full |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
title_fullStr |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
title_full_unstemmed |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
title_sort |
Avoiding speaker variability in pronunciation verification of children's disordered speech |
dc.subject.keyword.spa.fl_str_mv |
Pronunciation evaluation Children speech Speech disorders |
topic |
Pronunciation evaluation Children speech Speech disorders |
description |
This paper deals with the problematic of speaker variability in a task of pronunciation verification for the speech therapy of children and young adults in Computer-Aided Pronunciation Training (CAPT) tools. The baseline system evaluates two different score normalization techniques: Traditional Test normalization (T-norm), and a novel Nbest based normalization that outperforms the first by normalizing to the log-likelihood score of the first alternative phoneme in an unconstrained N-best list. When performing speaker adaptation, the use of all the adaptation data from the speaker improves the performance measured in Equal Error Rate (EER) of these systems compared to the speaker independent systems; but this can be outperformed by more precise models that only adapt to the correctly pronounced phonetic units as labeled by a set of human experts. The best EER obtained in all experiments is 15.63% when using both elements: Score normalization and speaker adaptation. The possibility of automatizing a more precise adaptation without the human intervention is finally proposed and discussed. |
publishDate |
2009 |
dc.date.created.spa.fl_str_mv |
2009-11 |
dc.date.accessioned.none.fl_str_mv |
2020-08-28T15:47:55Z |
dc.date.available.none.fl_str_mv |
2020-08-28T15:47:55Z |
dc.type.eng.fl_str_mv |
article |
dc.type.coarversion.fl_str_mv |
http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.coar.fl_str_mv |
http://purl.org/coar/resource_type/c_6501 |
dc.type.spa.spa.fl_str_mv |
Artículo |
dc.identifier.doi.none.fl_str_mv |
https://doi.org/10.1145/1640377.1640388 |
dc.identifier.issn.none.fl_str_mv |
ISBN: 978-1-60558-690-8 |
dc.identifier.uri.none.fl_str_mv |
https://repository.urosario.edu.co/handle/10336/28304 |
url |
https://doi.org/10.1145/1640377.1640388 https://repository.urosario.edu.co/handle/10336/28304 |
identifier_str_mv |
ISBN: 978-1-60558-690-8 |
dc.language.iso.spa.fl_str_mv |
eng |
language |
eng |
dc.relation.citationTitle.none.fl_str_mv |
WOCCI `09: Proceedings of the 2nd Workshop on Child, Computer and Interaction;CMI-MLMI `09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009) |
dc.relation.ispartof.spa.fl_str_mv |
WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and Interaction CMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts, ISBN: 978-1-60558-690-89 (2009); pp. 1-5 |
dc.relation.uri.spa.fl_str_mv |
https://dl.acm.org/doi/10.1145/1640377.1640388 |
dc.rights.coar.fl_str_mv |
http://purl.org/coar/access_right/c_16ec |
dc.rights.acceso.spa.fl_str_mv |
Restringido (Acceso a grupos específicos) |
rights_invalid_str_mv |
Restringido (Acceso a grupos específicos) http://purl.org/coar/access_right/c_16ec |
dc.format.mimetype.none.fl_str_mv |
application/pdf |
dc.publisher.spa.fl_str_mv |
Association for Computing Machinery |
dc.source.spa.fl_str_mv |
WOCCI '09: Proceedings of the 2nd Workshop on Child, Computer and Interaction CMI-MLMI '09: International Conference On Multimodal Interfaces/Workshop On Machine Learning For Multimodal Interfaces Cambridge Massachusetts (November, 2009) |
institution |
Universidad del Rosario |
dc.source.instname.none.fl_str_mv |
instname:Universidad del Rosario |
dc.source.reponame.none.fl_str_mv |
reponame:Repositorio Institucional EdocUR |
repository.name.fl_str_mv |
Repositorio institucional EdocUR |
repository.mail.fl_str_mv |
edocur@urosario.edu.co |
_version_ |
1814167521715027968 |