Active Learning with Multiple Views
Ion Muslea
University of Southern California
Despite the practical success of machine learning in many real world domains, labeling the training data is time consuming, tedious, and error prone. In this talk, I focus on reducing the need for labeled data in multi-view learning tasks. The key characteristic of multi-view tasks is that the target concept can be independently learned within different views (i.e., disjoint sets of features that are sufficient to learn a concept). For instance, a robot can avoid obstacles based on sonar, laser, or vision sensors; similarly, a Web page can be classified either based on the words in the document or based on the words in the HTML hyperlinks pointing to it.
In order to reduce the need for labeled data, I use active learning
algorithms that detect and ask the user to label only the most
informative examples in a domain. I introduce a family of multi-view
active learners that are based on the idea of learning from
mistakes. More precisely, they query examples on which the views
predict a different label: if two views disagree, one of them is
guaranteed to make a mistake. I also show that existing multi-view
learners perform unreliably if the views are inadequate. To cope with
this problem, I introduce two complementary solutions. First, by
interleaving bootstrapping and active learning, I obtain a novel
multi-view learner that has a robust behavior over a wide spectrum of
domains that have inadequate views. Second, I introduce a
meta-learning algorithm that is first trained on several solved
learning tasks and then predicts whether or not the views are
"sufficiently adequate" for solving a new, unseen learning task. I
evaluate these three novel algorithms on a variety of real-world
domains, from information extraction and text classification to
advertisement removal and discourse tree parsing. The empirical
results show that my algorithms consistently outperform existing
state-of-the-art learners.
Date: Thursday, January 9 |
Time: 4:15-5:30PM |
Place: Cordura 100 |
Return to the seminar schedule