Seminar on Computational Learning and Adaptation


  A stability based method for discovering structure in clustered data

Asa Ben-Hur
Dept. of Biochemistry, Stanford
http://tx.technion.ac.il/~asa

Most clustering algorithms provide a clustering of a dataset regardless of whether the data actually has cluster structure or not. To address this issue, we present a method for assessing the presence of structure in clustered data. The method is based on the idea that a "good" clustering should be stable under perturbations of the data. We characterize stability using the a similarity measure between a reference clustering and clusterings obtained from sub-samples of the data. High similarities indicate a stable clustering pattern. We argue that stability is a desirable feature of a clustering solution that implies the existence of cluster structure. The proposed method can be used with any clustering algorithm; it provides a means of rationally defining an optimum number of clusters, choosing various aspects of the clustering algorithm, including feature selection. We show results on several datasets using a hierarchical clustering algorithm, and demonstrate with the method that using a few leading principal components enhances cluster structure.



Date: Thursday, January 30

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule