New Directions For IR Evaluation

Our goal is to bring together researchers from information retrieval and related research communities (e.g., recommender systems, text data mining, computer-supported cooperative work, and online communities) to explore new directions for evaluation-guided research on the development of systems to support information access. We propose to organize this workshop around two key questions:

1) What opportunities exist to foster important new research through the creation of test collections for genre that have not previously been available?

2) What types of task models should new test collections be designed to support?

Test collections developed at venues such as TREC, NTCIR, CLEF and TDT have historically focused strongly (although not exclusively) on news, in part because it has proven to be practical to assemble large collections of news stories, and in part because good performance in that domain seems to also yield good performance on a broader range of information access tasks. Recent years have seen this focus expand to include, for example, a Web track (at TREC) and patent retrieval (at NTCIR). Similarly, the TREC video track is now working with non-news content. Our goal for this workshop is to focus on the domain of on-line conversations, to see whether there is a sufficient interest in the IR community to study the genre, and propose the ways in which such a study could be facilitated through the creation of standard test collections.

RIA and "Where can IR go from here?"

In the summer of 2003, NIST organised a 6-week workshop called Reliable Information Access or RIA. The RIA workshop brought together seven different top research IR systems and investigated how the systems were getting their results, and why the systems failed on some topics and succeeded on others. The SIGIR 2004 workshop will focus on discussing the implications of lessons learned from RIA; how they affect our understanding of what is currently happening in research systems, and what they suggest are areas of IR research that warrant immediate concentrated work. The main goal of the workshop is to end up with a list of concrete research proposals suggested by the lessons from the RIA workshop and the RIA databases. The proposals will run the gamut from undergraduate projects through SIGIR papers up to Ph.D. thesis areas.

Peer-to-Peer IR

Peer-to-peer (P2P) systems have emerged as a popular way to share huge volumes of data. However, retrieval methods for P2P systems are still in their infancy. This workshop will focus on new methods of resource representation, resource selection, and data fusion in peer-to-peer networks. The workshop particularly encourages papers that address heterogeneous peer-to-peer networks (e.g. a variety of data types and service providers), as well as papers about methods that cope with partial and uncertain information. However, more broadly, papers are solicited on any topic related to IR in peer-to-peer networks.

This workshop will discuss current retrieval methods of P2P systems, as well as the adaptation of distributed IR methods for P2P systems. Thus, it will involve both researchers from the P2P area interested in IR methods as well as IR researchers aiming at extending their methods for P2P systems.

Information Retrieval in Context

There is a growing realisation that relevant information will be accessible increasingly across media and genres, across languages and across modalities. The retrieval of such information will depend on time, place, history of interaction, task in hand, and a range of other factors that are not given explicitly but are implicit in the interaction and ambient environment, namely the context. IR research is now conducted in multi-media, multi-lingual, and multi-modal environments, but largely out of context. However, such contextual data can be used effectively to constrain retrieval of information thereby reducing the complexity of the retrieval process. To achieve this, context models for different modalities will need to be developed so that they can be deployed effectively to enhance retrieval performance. Thus truly context-aware and -dependent retrieval will become feasible.

This workshop will explore a variety of theoretical frameworks, characteristics and research approaches to focus on an agenda of activities to be recommended for future interactive IR (IIR) research.

Search and Discovery in Bioinformatics

There are now numerous bioinformatics resources containing research literature, genetic sequences, protein sequences and protein structures. Large-scale efforts are underway to organise these rapidly growing resources and create effective tools for supporting access. Although there is increasing interest among members of the information science community in bioinformatics content, few forums exist where challenges in this area are addressed from the perspective of IR researchers.

A major goal of the workshop is to attract researchers interested in various critical facets of bioinformatics information retrieval research. Another goal is to provide opportunities for sharing of current findings, discussing major challenges, and developing networks of collaborators.

Integration of Information Retrieval and Databases (IR + DB)

Information retrieval (IR) is associated by many with 'document retrieval' because of its past (and still ongoing) focus on text document retrieval. Database (DB) research his associated with (object-) relational data modelling, SQL, transaction-based processing, and many more aspects of databases. Whereas DB research has been driven for years by structured languages and the idea of data modelling and abstraction, IR focused on measuring retrieval quality for large (mostly text) collections. Nowadays, multimedia and XML collections are a driving force for the integration of IR and DB approaches. IR yields the methods for relevance-based ranking, while DB research provides methods for dealing with structured, and, increasingly, semi-structured data.

The purpose of this workshop is to bring together researchers of the DB and IR fields, facilitating exchange on the progress in developing and applying IR+DB (or DB+IR) approaches. Applications include data warehousing, web retrieval, heterogeneous collections, semantically rich information systems, and others. The workshop covers theoretical as well as pragmatic research.

XML and Information Retrieval

The focus of this workshop will be on issues related to the application of IR methods to XML data for querying, retrieval, navigating, etc. We believe that we have come a long way since the first workshop in 2000, when XML was entirely dominated by the DB community. However, there is still room for more efforts in this field, in particular from the IR point of view. The third workshop will bring together researchers and practitioners interested in XML and IR. We will review the progress that has been made since the two previous workshops. More specifically, recent technologies, models and new efforts will be discussed.

Geographical IR

Information technology for handling geographic information has been based largely on the highly structured map-based representations of space used in most geographical information systems (GIS). Relatively little effort has been expended on developing facilities required to access less structured, textual information, in which geographical context may be given by place names and associated terminology for spatial relationships. Such geographical text is commonly found in web documents, but geographical terms are considered by conventional search engines no differently to other search terms. As a consequence, documents will only be retrieved if they contain exact matches with the geographical terminology in the query expression. Documents that refer to alternative versions of the query place name or to places that are in the vicinity, either nearby or even within the query place are unlikely to be found.

In recent years, a variety of work has looked at the potential of indexing and retrieving unstructured text from the web using geospatial location. The purpose of this workshop is to bring together the growing community of researchers and practitioners working in the field of geographic information retrieval to discuss progress within the field and discuss future research strands. We aim to produce a high quality publication setting out the state of the art in Geographic Information Retrieval and suggesting a research agenda for the coming years.

Mathematical Formal Methods

This workshop aims to promote discussion and interaction among those with theoretical and applicative research interests in mathematical/formal aspects of Information Retrieval coming from a - potentially and relatively - large spectrum of different IR fields, and also at being a forum for the presentation of both theoretical and applicative results (e.g., foundational issues; description and/or integration of models; retrieval applications; mathematical/formal techniques, properties and structures in IR; existing and/or new theories and theoretical aspects).


Semantic Web

Tim Berners-Lee created the vision of a Semantic Web that enables automated information access and use, based on machine-processable semantics of data. Information retrieval can benefit from building ontologies and other semantic structures, making it possible to have a better understanding of the application domains and user queries. Indeed, Semantic Web technologies could be a basis for intelligent retrieval. On the other hand, the Information Retrieval field has a lot to bring to the Semantic Web community, based on 30 years of research and development based in the context of very large collections of documents. This workshop aims to bring researchers from the two communities together.

The workshop will explore a variety of theoretical frameworks, applications, techniques and research approaches centred on how to achieve mutual benefit from IR and Semantic Web, as well as high impact.

Information Retrieval For Question Answering

Open domain question answering has become a very active research area over the past few years, due in large measure to the stimulus of the TREC Question Answering track. This track addresses the task of finding exact answers to natural language (NL) questions (e.g. "How tall is the Eiffel Tower") from large text collections. To find exact answers requires processing texts at a level of detail that cannot be carried out at retrieval time for very large text collections. This limitation has led many researchers to propose, broadly, a two stage approach to the QA task. In stage one, a subset of query-relevant texts are selected from the whole collection. In stage two, this subset is subjected to detailed processing for answer extraction. To date, stage one has received limited explicit attention, despite its obvious importance - performance at stage two is bounded by performance at stage one. The goal of this workshop is to correct this situation, and, hopefully, to draw attention of IR researchers to the specific challenges raised by QA.

