Seminar on Computational Learning and
Adaptation
Text Segmentation with Probabilistic Latent Semantic Analysis
Thorsten Brants
Palo Alto Research Center
brants@parc.com
Probabilistic Latent Semantic Analysis, which has recently received
considerable attention, is a statistical technique that uses a latent
class model or aspect model. Rather than directly modeling conditional
probabilities between two random variables, it introduces a latent
variable together with independence assumptions given the value of the
latent variable. After a short introduction to this approach, we will
report on current research at PARC using the method. The application
problem is topic-based text segmentation, that is, identification of
boundaries between parts of a document that bear on different topics.
We will show experimental results that compare Probabilistic Latent
Semantic Analysis to other techniques and discuss the conditions under
which it appears to offer a significant advantage.
This talk describes joint work with Francine Chen at PARC and Ioannis
Tsochantaridis at Brown University.
Date: Thursday, April 18
|
Time: 4:15-5:30PM
|
Place: Cordura 100
|
Return to the seminar schedule