Seminar on Computational Learning and Adaptation


  Dialogue Act Tagging With Unlabeled Data?

Anand Venkataraman
Speech Technology and Research Laboratory
SRI International
Menlo Park, CA
anand@speech.sri.com

Labeling utterances with their dialogue act category (such as Question, Answer, Back-channel, etc.) is an important step toward speech understanding, yet training such taggers usually requires large amounts of data labeled by linguistic experts. We investigated the use of unlabeled data for training HMM-based dialog act taggers. Three techniques were found to be effective for bootstrapping a tagger from very small amounts of labeled data:

  1. iterative relabeling and retraining on unlabeled data;
  2. a dialog grammar to model dialog act context, and
  3. a model of the prosodic correlates of dialog acts.

On the SPINE dialog corpus, the combined use of prosodic information and unlabeled data reduces the tagging error between 12% and 16%, compared to baseline systems using word information only.

In this talk, I will describe the principles behind the technique, the framework used to test them and experimental results on the SPINE corpus.

This is joint work with Luciana Ferrer, Andreas Stolcke, and Liz Shriberg.



Date: Thursday, May 15

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule