The Continued Saga of DB-IR Integration


Presenters: Ricardo Baeza-Yates and Mariano Consens

Ricardo Baeza-Yates is PhD in Computer Science from the University of Waterloo, Canada. Presently he is an ICREA Research Professor at Universitat Pompeu Fabra in Barcelona, Spain, while he is on sabbatical leave from the Dept. of Computer Science, Universidad de Chile, where he is the director of the Center for Web Research. His fields of research are information retrieval, algorithms, and information visualization. He is co-author of the book Modern Information Retrieval, published in 1999 by Addison-Wesley, as well as co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992; plus over 100 other publications.

Mariano Consens research interests are in the areas of Data Management Systems and the Web, with a current focus on XML searching, autonomic systems and pervasive computing. He has over 25 publications and two patents, including journal publications selected from best conference papers. Mariano received his PhD and MSc degrees in Computer Science from the University of Toronto. Consens has been a faculty member in Information Engineering at the MIE Department, University of Toronto, since 2003. Before that, he was research faculty at the School of Computer Science, University of Waterloo, from 1994 to 1999. In addition, he has been active in the software industry as a founder and CTO of several startups.


The world of data has been developed from two main points of view: the structured relational data model and the unstructured text model. The two distinct cultures of databases and information retrieval now have a natural meeting place in the Web with its semi-structured XML model. As web-style searching becomes an ubiquitous tool, the need for integrating these two viewpoints becomes even more important. This tutorial will provide an overview of the different issues and approaches put forward by the IR and DB communities and survey the DB-IR integration efforts. Both earlier proposals as well as recent ones (in the context of XML in particular) will be discussed. A variety of application scenarios for DB-IR integration will be covered, including examples of current industrial tools.

The tutorial will consist of two parts: the first part will cover the problem space (basic concepts, requirements, models) and the second part the solution space (approaches and techniques).

back to Tutorials' page

Copyright 2004, ACM. All rights reserved.