Seminar on Computational Learning and
Adaptation
Sources of Success for Information Extraction Methods
Joseph Smarr
Symbolic Systems Program
Stanford University
jsmarr@stanford.edu
This talk examines Boosted Wrapper Induction (BWI) as an exemplar of
recent rule-based information extraction (IE) techniques. We report
on results from a variety of tasks (including extraction from
several natural text document collections) to provide a systematic
analysis of how each of BWI's algorithmic components, particularly
boosting, contributes to its performance over comparable methods.
We show that the benefit of boosting comes from its ability to
re-weight examples in order to learn specific rules (resulting in
high precision) combined with its ability to continue to learn rules
after all positive examples have been covered (resulting in high
recall). We also propose a new quantitative measure for the regularity
of an extraction task, and show that it is a strong indicator of IE
performance. Finally, we investigate the impact of exploiting
grammatical and semantic information for IE in natural text domains,
and we show that even limited grammatical information can improve both
the regularity and performance of natural text extraction tasks.
Date: Thursday, October 25
|
Time: 4:15-5:30PM
|
Place: Cordura 100
|
Return to the seminar schedule