Articulatory Feature Recognition Using Dynamic Bayesian Networks
Joe Frankel
Centre for Speech Technology Research
Edinburgh University
An ongoing project at Edinburgh University is to
build a speech recognition system in which a set of multi-level discrete
articulatory features (AF), rather than phones, mediate between words and
acoustic observations. The motivation for such an approach is to use a state
representation which is tailored toward characterizing the variation present in
natural speech, variation which arises from the asynchronous, overlapping nature
of speech production.
In this talk I describe work to date which has largely focused on developing
articulatory feature recognition using dynamic Bayesian networks (DBN). A DBN
approach allows us to build a model which incorporates dependencies between
feature streams. The model is initialized through training on AF labels derived
from a time-aligned phone segmentation, then by applying an embedded training
scheme, a set of asynchronous feature changes is learned in a data-driven
manner. I will present the results of AF recognition experiments on the OGI
Numbers corpus which demonstrate performance improvements due to modelling
inter-feature dependencies, and that the embedded training scheme reduces the
dependence on phone-derived articulatory feature labels. Finally, I will discuss
future directions and recent developments.
This talk is co-sponsored with the Natural Language and Speech Processing (NLaSP)
Colloquium, described at
http://www-nlp.stanford.edu/events.shtml
|
Date: Wed., April 26 |
Time: 4:15-5:30PM |
Place: Room 200-205 (History Corner) |
Return to the seminar schedule