Learning
Semantic Mappings between Disparate Representations
Alon Y. Halevy
University of Washington
(Currently on sabbatical at Stanford)
Integration of data from multiple sources is one of the longest-standing problems facing the data management and AI communities. In addition to being a key problem in large-scale science projects, data integration is central to data management in large enterprises, coordination among government agencies, and query answering on the World Wide Web. At a fundamental level, the challenge in data integration is reconciling the semantics of disparate data sets, each expressed with different database structures. In this talk I will describe the role of machine learning in addressing this problem. Specifically, I will describe an approach to semantic reconciliation that learns from a large corpus of database schemas and mappings between them. We can leverage such a corpus to find common representation patterns and variations, which in turn lets one identify them in unseen schemas. I will describe the application of this approach to schema mapping and to similarity search in collections of Web services. Finally, I will argue that this approach offers many exciting challenges to machine learning research.
This talk describes joint work with Jayant Madhavan, Luna Dong, Anhai Doan (UIUC), Phil Bernstein (Microsoft Research), and Pedro Domingos.
|
Date: Wednesday, February 2, 2005 |
Time: 4:15-5:30PM |
Place: Gates 104 |
Return to the seminar schedule