Introduction to Web IR


Presenters: Andrei Broder and Prabhakar Raghavan

Andrei Broder is an IBM Distinguished Engineer and the CTO of the Institute for Search and Text Analysis in IBM Research. From 1999 until 2002 he was Vice President for Research and Chief Scientist at the AltaVista Company. He was graduated Summa cum Laude from Technion, the Israeli Institute of Technology, and obtained a Ph.D. in Computer Science at Stanford University. Broder was the SIGIR keynote speaker in 2003 and is co-winner of the Best Paper award at WWW6 (for his work on duplicate elimination of web pages) and at WWW9 (for his work on mapping the web). He has published more than seventy papers and was awarded seventeen patents. He serves as chair of the IEEE Technical Committee on Mathematical Foundations of Computing.

Prabhakar Raghavan is Head of Research at Yahoo! and Consulting Professor of Computer Science at Stanford University. His research interests include semi-structured retrieval, text mining and randomized algorithms. He is Editor-in-chief of the Journal of the ACM and a Fellow of the ACM and of the IEEE. He has offered several tutorials on web search including at SIGIR. Raghavan holds a PhD from the University of California at Berkeley and an undergraduate degree in electrical engineering from IIT in Madras.


This tutorial provides an introduction to the main concepts, issues, and techniques of web-based information retrieval. Topics covered include the differences between conventional and web IR, the evolution of web search technology, crawling and corpus construction, duplicate detection, link analysis, and economic models behind commercial web search.

The presentation will be self-contained as much as possible: prerequisites are only a basic understanding of elementary IR concepts, algorithms and data structures, linear algebra, and probability theory. Participants will receive an extensive bibliography.

back to Tutorials' page

Copyright 2004, ACM. All rights reserved.