A stability based method for discovering structure in clustered data
Asa Ben-Hur
Dept. of Biochemistry, Stanford
http://tx.technion.ac.il/~asa
Most clustering algorithms provide a clustering of a dataset regardless of whether the data actually has cluster structure or not. To address this issue, we present a method for assessing the presence of structure in clustered data. The method is based on the idea that a "good" clustering should be stable under perturbations of the data. We characterize stability using the a similarity measure between a reference clustering and clusterings obtained from sub-samples of the data. High similarities indicate a stable clustering pattern. We argue that stability is a desirable feature of a clustering solution that implies the existence of cluster structure. The proposed method can be used with any clustering algorithm; it provides a means of rationally defining an optimum number of clusters, choosing various aspects of the clustering algorithm, including feature selection. We show results on several datasets using a hierarchical clustering algorithm, and demonstrate with the method that using a few leading principal components enhances cluster structure.
Date: Thursday, January 30 |
Time: 4:15-5:30PM |
Place: Cordura 100 |
Return to the seminar schedule