Seminar on Computational Learning and Adaptation


Concept Drift:
Tracking Distributions and Recurrent Inductive Transfer

George Forman
Data Mining and Machine Learning Group
Hewlett Packard Labs
Palo Alto, CA 94304
http://www.hpl.hp.com/personal/George_Forman

Most machine learning research in classification assumes the training set is an iid random sample from the target population.  However, in many real-world situations the class distribution changes over time, which erodes the effectiveness of classifiers and calibrated probability estimators.  I define three subtypes of Concept Drift, and describe my recent research for two subtypes.

The first part of the talk focuses on the problem of estimating the number of positives in a test set despite inaccurate classification. An empirical evaluation on a text classification benchmark shows that a mixture model is surprisingly effective even when positives are very scarce in the training set -- a common case in information retrieval.

The second part of the talk focuses on concept drift with recurrent themes.  Empirical results for Reuters2000 show the difficulty of the problem, and show the success of a new model involving inductive transfer from past classifiers, when sufficient past training data is available.


Date: Wed., Jan 11

Time: 4:15-5:30PM 

Place: Cordura 100


Return to the seminar schedule