Seminar on Computational Learning and Adaptation


  Learning Structure from Sequences, with Applications in a Digital Library

Ian H. Witten
Department of Computer Science
University of Waikato
Hamilton, New Zealand
ihw@cs.waikato.ac.nz
www.cs.waikato.ac.nz/~ihw

The services that digital libraries provide to users can be greatly enhanced by automatically gleaning certain kinds of information from the full text of the documents they contain. This talk will review recent work that applies novel techniques of machine learning (broadly interpreted) to extract information from plain text. We describe three areas of research: hierarchical phrase browsing, including efficient methods for inferring a phrase hierarchy from a large corpus of text; text mining using adaptive compression techniques, giving a new approach to word segmentation, generic entity extraction, and acronym extraction; and keyphrase extraction and its application in a digital library.



Date: Thursday, November 29

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule