Seminar on Computational Learning and Adaptation


 Articulatory Feature Recognition Using Dynamic Bayesian Networks

Joe Frankel
Centre for Speech Technology Research
Edinburgh University
 

An ongoing project at Edinburgh University is to build a speech recognition system in which a set of multi-level discrete articulatory features (AF), rather than phones, mediate between words and acoustic observations. The motivation for such an approach is to use a state representation which is tailored toward characterizing the variation present in natural speech, variation which arises from the asynchronous, overlapping nature of speech production.

In this talk I describe work to date which has largely focused on developing articulatory feature recognition using dynamic Bayesian networks (DBN). A DBN approach allows us to build a model which incorporates dependencies between feature streams. The model is initialized through training on AF labels derived from a time-aligned phone segmentation, then by applying an embedded training scheme, a set of asynchronous feature changes is learned in a data-driven manner. I will present the results of AF recognition experiments on the OGI Numbers corpus which demonstrate performance improvements due to modelling inter-feature dependencies, and that the embedded training scheme reduces the dependence on phone-derived articulatory feature labels. Finally, I will discuss future directions and recent developments.

This talk is co-sponsored with the Natural Language and Speech Processing (NLaSP) Colloquium, described at http://www-nlp.stanford.edu/events.shtml
 


Date: Wed., April 26

Time: 4:15-5:30PM 

Place: Room 200-205 (History Corner)


Return to the seminar schedule