Concept Drift:
Tracking Distributions and Recurrent Inductive Transfer
George Forman
Data Mining and Machine Learning Group
Hewlett Packard Labs
Palo Alto, CA 94304
http://www.hpl.hp.com/personal/George_Forman
Most machine learning research in classification assumes the
training set is an iid random sample from the target population. However, in
many real-world situations the class distribution changes over time, which
erodes the effectiveness of classifiers and calibrated probability estimators.
I define three subtypes of Concept Drift, and describe my recent research for
two subtypes.
The first part of the talk focuses on the problem of estimating the number of
positives in a test set despite inaccurate classification. An empirical
evaluation on a text classification benchmark shows that a mixture model is
surprisingly effective even when positives are very scarce in the training set
-- a common case in information retrieval.
The second part of the talk focuses on concept drift with recurrent themes.
Empirical results for Reuters2000 show the difficulty of the problem, and show
the success of a new model involving inductive transfer from past classifiers,
when sufficient past training data is available.
|
Date: Wed., Jan 11 |
Time: 4:15-5:30PM |
Place: Cordura 100 |
Return to the seminar schedule