Dialogue Act Tagging With Unlabeled Data?
Anand Venkataraman
Speech Technology and Research Laboratory
SRI International
Menlo Park, CA
anand@speech.sri.com
Labeling utterances with their dialogue act category (such as Question, Answer, Back-channel, etc.) is an important step toward speech understanding, yet training such taggers usually requires large amounts of data labeled by linguistic experts. We investigated the use of unlabeled data for training HMM-based dialog act taggers. Three techniques were found to be effective for bootstrapping a tagger from very small amounts of labeled data:
On the SPINE dialog corpus, the combined use of prosodic information and unlabeled data reduces the tagging error between 12% and 16%, compared to baseline systems using word information only.
In this talk, I will describe the principles behind the technique, the framework used to test them and experimental results on the SPINE corpus.
This is joint work with Luciana Ferrer, Andreas
Stolcke, and Liz Shriberg.
Date: Thursday, May 15 |
Time: 4:15-5:30PM |
Place: Cordura 100 |
Return to the seminar schedule