Statistical Language Models for Information Retrieval


Presenter: ChengXiang Zhai

ChengXiang Zhai is an Assistant Professor of Computer Science at the University of Illinois at Urbana-Champaign. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. His research interests broadly include information retrieval, natural language processing, machine learning, and bioinformatics. His most recent work, including his dissertation, is centered on developing formal retrieval frameworks and applying statistical language models to text retrieval, especially in directions such as personalized search and semi-structured information retrieval. He was the IR program co-chair for ACM CIKM 2004. He is a recipient of the 2004 NSF CAREER award and the SIGIR 2004 best paper award.


Statistical language models have recently been successfully applied to many information retrieval problems. A great deal of recent work has shown that statistical language models not only lead to superior empirical performance, but also facilitate parameter tuning and open up possibilities for modeling non-traditional retrieval problems. In general, statistical language models provide a more principled way of modeling various kinds of retrieval problems.

The purpose of this tutorial is to systematically review the recent progress in applying statistical language models to information retrieval with an emphasis on the underlying principles and framework, empirically effective language models, and language models developed for non-traditional retrieval tasks. Tutorial attendees can expect to learn the major principles and methods of applying statistical language models to information retrieval, the outstanding problems in this area, as well as obtain comprehensive pointers to the research literature.

The tutorial should appeal to both people working on information retrieval with an interest in applying more advanced language models and those who have a background on statistical language models and wish to apply them to information retrieval. Attendees will be assumed to know basic probability and statistics.

back to Tutorials' page

Copyright 2004, ACM. All rights reserved.