Seminar on Computational Learning and Adaptation


  Large Conjunctive Clusters and Bicliques

Nina Mishra

Stanford University / HP Labs

We propose a new formulation of the clustering problem that differs from previous work in several ways. First, the goal is to explicitly output a collection of simple and meaningful conjunction of attributes that define the clusters. Second, the clusters might overlap. Third, the clusters might not cover all points. Finally, a point may be assigned to a cluster description even if it only satisfies most, and not necessarily all, of the attributes in the conjunction.

We give a graph-theoretic formulation of our clustering problem where the input is a bipartite graph and the goal is to find a collection of large bicliques in the graph. Identifying one largest conjunctive cluster is equivalent to finding a maximum edge biclique. Since this problem is NP-hard [Peeters00] and there is evidence that it is difficult to approximate [Feige02], we solve a relaxed version where the objective is to find a large subgraph that is close to being a biclique. We give a randomized algorithm that finds a relaxed biclique with almost as many edges as the maximum biclique. We then extend this algorithm to identify a good collection of large relaxed bicliques. A key property of these algorithms is that they require a sample of data points of size independent of both the number of data points and the number of attributes. The running time of the algorithm is linear in the number of attributes.



Date: Wednesday, February 4

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule