print-friendly version


The 30th Annual International ACM SIGIR Conference
23-27 July 2007, Amsterdam


Workshops take place at the University of Amsterdam, Building A, on Friday July 27th.

3A - Learning to Rank for Information Retrieval

Torsten Joachims
Hang Li
Tie-Yan Liu
Chengxiang Zhai

The task of "learning to rank" has emerged as an active and growing area of research both in information retrieval and machine learning. The goal is to design and apply methods to automatically learn a function from training data, such that the function can sort objects (e.g., documents) according to their degrees of relevance, preference, or importance as defined in a specific application.

The relevance of this task for IR is without question, because many IR problems are by nature ranking problems. Improved algorithms for learning ranking functions promise improved retrieval quality and less of a need for manual parameter adaptation. In this way, many IR technologies can be potentially enhanced by using learning to rank techniques.

The main purpose of this workshop, in conjunction with SIGIR 2007, is to bring together IR researchers and ML researchers working on or interested in the technologies, and let them to share their latest research results, to express their opinions on the related issues, and to discuss future directions.

3B - Web Information Seeking and Interaction

Kerry Rodden
Ian Ruthven
Ryen White

The World Wide Web has provided access to a diverse range of information sources and systems. People engaging with this rich network of information may need to interact with different technologies, interfaces, and information providers in the course of a single search task. These systems may offer different interaction affordances and require users to adapt their information-seeking strategies. Not only is this challenging for users, but it also presents challenges for the designers of interactive systems, who need to make their own system useful and usable to broad user groups. The popularity of Web browsing and Web search engines has given rise to distinct forms of information-seeking behaviour, and new interaction styles, but we do not yet fully understand these or their implications for the development of new systems.

Web information seeking and interaction (i.e., the interaction of users with Web-based content and applications during information-seeking activities) is a topic that unites many strands of academic and commercial research, from studies of information-seeking behaviour to the design and construction of large-scale interactive systems. Designing components to support this interaction (and evaluating these components) is particularly challenging given the scale of the Web, the diversity of the user population, the diversity in tasks being undertaken, and the dynamic nature of the information.

This workshop is intended to act as a focal point for researchers and practitioners whose work is related to web information seeking and interaction, to enable them to share experiences and collaborate.

3C - Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection

Benno Stein
Moshe Koppel
E. Stamatatos

The workshop shall bring together experts and prospective researchers around the exciting and future-oriented topic of plagiarism analysis, authorship identification, and high similarity search. This topic receives increasing attention, which results, among others, from the fact that information about nearly any subject can be found on the World Wide Web. At first sight, plagiarism, authorship, and near-duplicates may pose very different challenges; however, they are closely related in several technical respects.

The workshop addresses researchers, users, and practitioners from different fields: data mining and machine learning, document and knowledge management, semantic technologies, computer linguistics, social sciences, and information retrieval in general. We solicit contributions dealing with theoretical and practical questions of the development, use, and evaluation of theories and tools related to the workshop theme. Contributions will be peer-reviewed by at least two experts from the related field.

3D - Improving Web retrieval for non-English queries

Fotis Lazarinis
Jesus Vilares Ferro
John Tait

Over 60% of the online population are non-English speakers and it is probable the number of non-English speakers is growing faster than English speakers. Recent studies showed that non-English queries and unclassifiable queries have nearly tripled since 1997. Most search engines were originally engineered for English. They do not take full account of inflectional semantics nor, for example, diacritics or the use of capitals.

The main conclusion from the literature is that searching using non-English and non-Latin based queries results in lower success and requires additional user effort so as to achieve acceptable recall and precision. Further international search engines (like Yahoo and Google) are relatively weaker with monolingual non-English queries.

New tools and resources are needed to support researchers in non-English retrieval. New methodologies need to be proposed which will help the identification of problems in existing search engines. New teaching strategies should be formed aiding users to become more efficient in formulating their queries.

3E - Multimedia Information Retrieval

Roelof van Zwol
Stefan Rueger
Mark Sanderson
Yosi Mass

This fifth workshop on Multimedia Information Retrieval will take place in Amsterdam on July 27, 2007 in conjunction with SIGIR 2007. This full-day workshop will have a special focus on "New Challenges in Audio Visual Search". With the rising popularity of rich media services such as Flickr, YouTube, and Jumpcut, new challenges in large scale multimedia information retrieval have emerged that not only rely on meta-data but on content-based information retrieval combined with the collective knowledge of users and geo-referenced meta-data that is captured during the creation process. For the future, it is envisioned that multimedia search in mobile environments or on P2P networks will take on off a large scale.

This workshop follows four previous SIGIR workshops on multimedia information retrieval (1998, 1999, 2003, 2005), and aims to address and explore new challenges in multimedia information retrieval by bringing both re- searchers and practitioners together. We encourage submission and participation in this workshop not only from the core Information Retrieval community but also from researchers in databases, multimedia and image processing thus cross-fertilizing to information retrieval research.

3F - Large scale Distributed Systems for Information Retrieval

Flavio Junqueira
Vassilis Plachouras
Ivana Podnar Zarko
Fabrizio Silvestri

The Web is growing and the demand for fast, accurate search grows accordingly. To fulfill such a demand, information retrieval (IR) systems have to be capable of accommodating growth as well as processing a large number of queries. As the volume of data and the number of users are not trivial, there is a need for investigation on systems that are truly able to perform well under such conditions.

The main goal of this workshop is to bring together researchers interested in the design and implementation of IR systems to discuss ongoing work in the area along with future directions. In particular, it will focus on scalability and efficiency issues in large-scale distributed systems for IR. Equally interesting to this venue are novel applications for large-scale distributed IR architectures. Applications such as P2P Search and Community-based P2P file sharing leverage the resources contributed by the participant peers, whereas in Grid systems applications can potentially use the large amount of processing capacity available to provide more sophisticated services. Thus, new paradigms or ways of leveraging all the resources of large-scale systems are of high interest to this venue.

3G - Information retrieval and applications of graphical models

Juan F. Huete
Juan M. Fernandez-luna
Benjamin Piwowarski

Probabilistic models constitute an important kind of Information Retrieval (IR) model. They have been long and widely used, and offer a principled way of managing the uncertainty that naturally appears in many elements within this field. Nowadays, the dominant approach for managing probability within the field of Artificial Intelligence is based on the use of Bayesian Networks, and these have also been used within IR as extensions of classical probabilistic models.

As the main goal, this workshop wants to be a common space where researchers, in general, and young researchers specifically, can show their innovative GMs applications to the field of IR, in its wide problem space, opening a new discussion forum. Graphical Models include Bayesian Networks, possibilistic networks, Markov networks, dependence graphs, influence diagrams, probability trees, decision trees, and Fisher Kernel Discriminants, among others.

3H - Focused Retrieval (Question Answering, Passage Retrieval, Element Retrieval)

Andrew Trotman
Shlomo Geva
Jaap Kamps

Standard document retrieval finds atomic documents, and leaves it to the end-user to then locate the relevant information inside the document. Focused retrieval tries to remove the onus on the end-user, by providing more direct access to relevant information. That is, focused retrieval address information retrieval, not document retrieval. Focused retrieval is becoming increasingly important in all areas of information retrieval as exists in many forms including: Question Answering, Passage Retrieval, and Element Retrieval.

3K - Searching Spontaneous Conversational Speech

Franciska de Jong
Douglas Oard
Roeland Ordelman
Stephan Raaijmakers

Nearly a decade ago, we learned from the TREC Spoken Document Retrieval (SDR) track that searching speech was a "solved problem." Three factors were key to this success: (1) broadcast news has a "story" structure that resembled written documents, (2) the redundancy present in human language meant that search effectiveness held up well over a reasonable range of transcription accuracy, and (3) sufficiently accurate Large-Vocabulary Continuous Speech Recognition (LVCSR) systems could be built for the planned speech of news announcers.

The long-term trend in speech recognition research has been toward transcription of progressively more challenging sources. Over the last few years, LVCSR for spontaneous conversational speech has improved to a degree where transcription accuracy comparable to what was previously found to be effective for broadcast news can now be achieved for a diverse range of sources. This has inspired a renaissance in research on search and browse technology for spoken word collections in communities focused on: (1) archived cultural heritage materials (e.g., interviews and parliamentary debates), (2) discussion venues (e.g., business meetings and classroom instruction), and (3) broadcast conversations (e.g., in-studio talk shows and call-in programs). Test collections are being developed in individual projects around the world, and some comparative evaluation activity for speech search technology has developed over this period. The time seems now right to look more broadly across these research communities for potential synergies that can help to shape the information retrieval research agenda of each of these communities by sharing ideas and resources.