Seminar on Computational Learning and Adaptation


  Text Segmentation with Probabilistic Latent Semantic Analysis

Thorsten Brants
Palo Alto Research Center
brants@parc.com

Probabilistic Latent Semantic Analysis, which has recently received considerable attention, is a statistical technique that uses a latent class model or aspect model. Rather than directly modeling conditional probabilities between two random variables, it introduces a latent variable together with independence assumptions given the value of the latent variable. After a short introduction to this approach, we will report on current research at PARC using the method. The application problem is topic-based text segmentation, that is, identification of boundaries between parts of a document that bear on different topics. We will show experimental results that compare Probabilistic Latent Semantic Analysis to other techniques and discuss the conditions under which it appears to offer a significant advantage.

This talk describes joint work with Francine Chen at PARC and Ioannis Tsochantaridis at Brown University.



Date: Thursday, April 18

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule