Most data-mining projects tend to be at least 70% data-massaging,
transforming and feature-engineering yet most of the attention in KDD
talks is usually given to the modeling process. In this talk I will
present experiences with a "real-world" consulting project done at IBM
on car insurance which was dominated by issues of data-representation,
cleaning and so forth. The task was to predict retail customer
attrition for a large, US car-insurance company. In addition to
cleaning issues, the project was also challenging in that attrition
analysis is akin to survival analysis: it is unclear what class label
to assign to customers that have not yet attritted but may do so in the
future. The talk is intended to give a flavor of the
kinds of problems encountered in a real-world data-mining project and
I will discuss issues on how to set-up such a project to maximize
chances of success.
Date: Thurs., November 5; Time: 4:15-5:30PM; Place: Gates 104
Return to seminar schedule.