IRLIST Digest ISSN 1064-6965 December 4, 1995 Volume XII, Number 47 Issue 284 ********************************************************** I. QUERIES 1. Device to capture screen during interaction II. JOBS 1. U. North Carolina: Professor, SILS III. NOTICES A. Publications 1. Special Topic Issue: IP&M B. Meetings 1. Text Retrieval Conference '96 2. Int'l. Conf. on Practical Aspects of Knowledge Mgmt. '96 ********************************************************** I. QUERIES I.1. Fr: William Hersh Re: Device to capture screen during interaction I am planning to undertake some research where we want to monitor the user's entire interaction with the retrieval system. That is, we want to capture and transcribe each key stroke, menu selection, mouse click, etc.. We have tried using logging software in the past, but just capturing key presses and mouse clicks does not show us the interaction with the system. We have also videotaped interactions, but the camera can be obtrusive and the quality variable. It seems like the ideal situation would be to capture the screen on to videotape directly. That is, hook a line into the video out signal and record it to a VCR during the user interaction. That way, you can transcribe the events from the videotape. Is there a technology to do this? Has anyone ever tried it? Any thoughts would be appreciated. Bill Hersh ********************************************************** II. JOBS II.1. Fr: Bob Lossee Re: U. North Carolina: Professor, SILS Cary C. Boshamer Professorship, School of Information and Library Science, University of North Carolina, Chapel Hill The University of North Carolina at Chapel Hill invites applications and nominations for the Cary C. Boshamer Professorship in the School of Information and Library Science. The school seeks an individual with an outstanding national and international reputation in the field of information and library science, broadly defined, and an exemplary record in teaching, research, and service, commensurate with the level of distinguished professor. Please send nominations and applications to: Robert M. Losee, Personnel Committee, School of Information and Library Science, CB# 3360, 100 Manning Hall, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-3360; Phone number 919-962-8366; Fax number 919-962-8071; E-mail address: losee.ils@mhs.unc.edu. Salary is competitive. Applications should include a curriculum vitae and the names of four references. Preference will be given to those candidates who apply by February 15, 1996. Applications will be accepted until the position is filled. Preferred starting date is July 1996. The University of North Carolina at Chapel Hill is an Affirmative Action, Equal Opportunity Employer. ********************************************************** III. NOTICES III.A.1. Fr: James Allan Re: Special topic issue of Information Processing and Management Special topic issue of Information Processing and Management on Methods and Tools for the Automatic Construction of Hypermedia Guest editors Maristella Agosti and James Allan The impressive popularity of the World Wide Web has created a corresponding demand for on-line data organized as a hypertext or hypermedia document collection. To date, most of that organization has been done by hand, a daunting--if not impossible--task for very large or very volatile collections such as newswire services or archival collections. To help organize such collections as well as those that are smaller or less dynamic, techniques are required for automatically constructing hypertext or hypermedia, or for providing user-assistance in that process. In July of 1995, a workshop was held in Seattle in conjunction with SIGIR '95 on Information Retrieval and the Automatic Construction of Hypertext. The workshop highlighted the importance and difficulty of this problem and indicated the value of more in-depth study. To help consolidate quality research on this problem, Information Processing and Management has created a special issue to address this topic. This issue is intended to address different approaches for the automatic transformation of collections of "flat" textual and multi-media documents to produce a structured hypertext/hypermedia base. Submitted papers should address tools and methods capable of producing an informative hypertext/hypermedia collection of documents that can be searched and browsed by content. Techniques for automatically augmenting the links in an existing hypertext/hypermedia are also relevant. A list of topics that may be addressed in submitted papers includes: automatically creating static and dynamic links, automatically assigning types to links, strategies for updating an existing hypertext/hypermedia base, evaluation of the quality of hypertext/hypermedia collections, usability by the final user of the resulting hypertext/hypermedia, and so forth. *** The deadline for paper submissions is Monday, 5 February 1996. *** If you intend to submit a paper, please send an e-mail message to both guest editors informing them of your intention, preferably by 15 January 1996; their e-mail addresses are listed below. Sending such a message is not necessary, but it will help the editors prepare for the review process. Instructions for submitting a paper: The following are invited: (i) full length papers reporting original work generally of up to 4000 words; (ii) brief communications of original work or work in progress of up to 1200 words; (iii) book reviews and critical literature reviews. Two copies of the manuscript must be sent to each guest editor (4 copies in all) at the addresses listed below. Include a title page with the article's title, the contact author's name, affiliation, address, telephone and FAX numbers, and e-mail address, if available. For more details refer to the ``Instructions to contributors'' included in each issue of Information Processing and Management. Papers submitted to this Special Issue are subject to the usual IPM peer review. Maristella Agosti James Allan Dept. of Electronics and Informatics Center for Intelligent IR University of Padua Computer Science Dept. Via Gradenigo, 6/a University of Massachusetts 35131 Padova Amherst, MA 01003 Italy USA agosti@ipdunivx.unipd.it allan@cs.umass.edu ********** III.B.1. Fr: Donna Harman Re: Text Retrieval Conference CALL FOR PARTICIPATION TEXT RETRIEVAL CONFERENCE January 1996 - November 1996 Conducted by: National Institute of Standards and Technology (NIST) Sponsored by: Advanced Research Projects Agency Software and Intelligent Systems Technology Office (ARPA/SISTO) The Text REtrieval Conference (TREC) has had a very successful four years and we would like to invite you to submit a proposal for participation in year five (TREC-5). The goal of this conference is to encourage research in text retrieval from large document collections by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. Both adhoc queries against archival data collections and routing (filtering or dissemination) queries against incoming data streams are being tested, in addition to at least 6 focussed "tracks" (see below). The conference has grown from 24 participating systems in 1992 to 36 participating systems in 1995, with proceedings published each year, and is now the major experimental effort in the field. Dissemination of TREC work and results other than in the (publically available) conference proceedings is welcomed, but the conditions of participation preclude specific advertising claims based on TREC results. Participants will be expected to work with approximately a million documents (2 gigabytes of data), retrieving lists of ranked documents that could be considered relevant to each of 100 topics (50 routing and 50 adhoc topics). NIST will distribute the data and will collect and analyze the results. As before, the workshop will be open only to participating systems that submit results and to government sponsors. SCHEDULE: Jan. 8, 1996 -- deadline for participation applications Feb. 1 -- acceptances announced, and permission forms for data distributed to new participants. The TIPSTER training documents come as 3 CD-ROMS containing about 3 gigabytes of data, in addition to 250 training topics and relevance judgments available via an ftp site. Mar. 1 -- program committee decisions on which groups present talks at the workshop; also on format of TREC-5 workshop April 1 -- NIST target date for availability of two new disks to be used for the adhoc task May 1 -- list of routing topics distributed June 1 -- routing queries due at NIST; test data for routing distributed to groups after routing queries received by NIST June 15 -- 50 new test topics for adhoc test distributed Aug. 1 -- results from 50 routing queries and 50 adhoc topics due at NIST Aug. 15 -- results from the tracks due at NIST Oct. 1 -- relevance judgments and individual evaluation scores due back to participants Nov. 20-22 -- TREC-5 conference at NIST in Gaithersburg, Md. TASK DESCRIPTION: Below is a brief summary of the task description. For more details, and samples of the topics and documents, see the online version of the TREC-3 proceedings (http://potomac.ncsl.nist.gov/trec). MAIN TASKS (adhoc and routing) Participants will receive 3 gigabytes of data for use in training of their systems, including development of appropriate algorithms or knowledge bases. The 250 topics used in the first four TREC workshops, and the relevance judgments for these topics will also be available via ftp. The topics are in the form of a formatted user need statement. Queries can either be constructed automatically from this topic description, or can be manually constructed. Two types of retrieval operations will be tested: a routing or filtering operation against new data, and an adhoc query operation against archival data. Fifty of the topics (selected from the 250 topics distributed for training) will be used by each group participating in the routing test to create formalized queries to be used for retrieval against new test data. Fifty new test topics (251-300) will be used as adhoc queries against the data to be distributed in April (about 2 gigabytes on CD-ROMs). Results from both types of queries (routing and adhoc) will be submitted to NIST as the ranked top 1000 documents retrieved for each query. Scoring techniques including traditional recall/precision measures will be run for all systems and individual results will be returned to each participant. TRACK TASKS: The goal of the tracks is to investigate areas tangential to the main tasks, or to investigate areas that are more focussed than the main tasks. A very brief summary of each of the 5 tracks run in TREC-4 is given below. All of these tracks will also be run in TREC-5, plus possibly several new tracks. The exact definition of the tracks in TREC-5 still being defined by interested participants, and details of the track should be obtained from the designated contact person. *Interactive track -- investigating searching as an interactive task by examining the process as well as the outcome. Contact person: Steve Robertson (ser@is.city.ac.uk) *Multilingual track -- working with non-English test collections. Two languages will be investigated in TREC-5, Chinese and Spanish. About 250 megabytes of Chinese and 25 topics will be used. The Spanish test collection built in TREC-3 and TREC-4 (250 megabytes of text and 50 topics) will be made available, but the design of the Spanish task may differ based on participation. Contact person: Ross Wilkinson (ross@kbs.citri.edu.au) *Multiple database merging -- investigation of techniques for merging results from the various TREC subcollections (as opposed to treating the collections as a single entity). Contact person: Ellen Voorhees (ellen@learning.scr.siemens.com) *Data corruption -- examining the effects of corrupted data (modelled on an OCR environment) by using corrupted versions of the TREC data. Contact person: Paul Kantor (kantor@zodiac.rutgers.edu) *Filtering -- evaluating routing systems on retrieving an unranked set of documents optimizing a specific effectiveness measure. Contact person: David Lewis (lewis@research.att.com) Groups may participate in either or both of the main tasks, plus any of the tracks. There is very strong encouragement, however, to participate in the main tasks, particularly those that serve as baselines for the various tracks. CONFERENCE FORMAT: The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Additionally some organizations may not wish to describe their proprietary algorithms, and these groups may chose to participate in a different manner (see Category C). To allow a maximum number of participants, the following three categories have been established. Category A: Full participation: Participants will be expected to work with the full data set, and to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session. Category B: Exploratory groups: Because small groups with novel retrieval techniques might like to participate but may have limited research resources, a category has been set up to work with only a subset of the data. This subset will consist of about 1/2 gigabyte of training data (and all training topics), and 1/4 gigabyte of test data (and all test topics). Participants in this category will be expected to follow the same schedule as category A, except with less data. New participants are encouraged to work in category B unless they have experience with such large data sets. Category C: Evaluation only: Participants in this category will be expected to work on the full data set, submit results for common scoring and tabulation, and present their results in a poster session. They will not be expected to describe their systems in detail but will be expected to report on time and effort statistics. Data (Test Collection): The test collection (documents, topics, and relevance judgments) will be an extension of the collection (English only) used for the ARPA TIPSTER project. The training collection was assembled from Linguistic Data Consortium text, and a signed User Agreement will be required from all participants. The documents are an assorted collection of newspapers (including the Wall Street Journal), newswires, journals, technical abstracts and email newsgroups. The new data to be distributed in April will be somewhat more varied than the training data, but will be in the same format (SGML). All documents will be typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology. RESPONSE FORMAT AND SUBMISSION DETAILS: By Jan. 8, 1996 organizations wishing to participate should respond to the call for participation by submitting a summary of their text retrieval approach, not to exceed two pages in length. The summary should include the strengths and significance of their approach to text retrieval, and highlight differences between their approach and other retrieval approaches. Groups that have participated in TREC-4 need to provide only two paragraphs, one describing their methods in TREC-4 and a second describing their plans for TREC-5. In addition to the system summary, each organization should indicate in which category they wish to participate (category A, B, or C). Groups new to TREC should briefly describe their abilities to handle this large amount of data. Please specify which main tasks and which tracks your group plans to participate in, and the person to whom correspondence should be directed. A full regular address, telephone number, and an email address should be given. EMAIL IS THE ONLY METHOD OF COMMUNICATION in TREC. The proposal should be in ascii so that it can easily be distributed to the program committee--detailed diagrams are not necessary. ALL RESPONSES SHOULD BE SUBMITTED BY JAN. 8, 1996 to the Program Chair, Donna Harman: harman@potomac.ncsl.nist.gov FOR COMPLETE INFORMATION, CONTACT THE CHAIR AT THE ABOVE ADDRESS. SELECTION OF PARTICIPANTS: All participants must be able to demonstrate their ability to work with the data collection (either the full collection or the subset). The program committee will be looking for as wide a range of text retrieval approaches as possible, and will select the best representatives of these approaches as speakers at the conference. ********** III.B.2. Fr: Ulrich Reimer Re: Int'l. Conf. on Practical Aspects of Knowledge Management -- CALL FOR WORKSHOP PROPOSALS -- International Conference on Practical Aspects of Knowledge Management October 30 - 31, 1996, in Basel, Switzerland In the following, we first give a general description of the concept and the underlying ideas of the conference. Subsequently, the call for workshop proposals can be found. The conference is supported by SGAICO (Swiss Group for Artificial Intelligence and Cognitive Science, being part of SI and member of ECCAI). GENERAL DESCRIPTION: It is more and more acknowledged that knowledge is one of the most important assets of organisations. Especially in industrialised countries with expensive but well-educated employees, products and services must be outstanding in terms of innovation, flexibility, and creativity. A prerequisite for being able to face current and future challenges is the systematic management of the knowledge assets. Studies indicate that no more than 20 percent of the knowledge available in organisations is really being used (Bilanz 1/95). What tremendous effects would it have if we could make use of 30 percent or more! Although the technology for an improved management of the key success factor "knowledge" is ready to use, we can notice that organisations are widely ignorant of its availability. Against this background, the conference has two major goals, each one being associated with a special track. Track 1: Provide Strategic Information on Knowledge Management (KM) =================================================================== The conference acts as a communications medium to top executives for making clear - what KM is, and - how KM can improve organisational structures and processes. To this end, we have a one-day special program for decision makers that gives high-level, strategic information about both of above points. The program includes: Invited Talks: by well-known experts of their field. They will give overviews of current KM technology available and will clearly work out how it can affect organisational change. VIP Round-Table Discussions: where the top executives have the opportunity to discuss innovative ideas in an informal setting with the invited speakers and the other: about KM, can formulate their doubts and reservations and will get answers from the experts. Track 2: Technology Transfer ============================ This two-day conference track aims at bridging the gap between application experts who will actually set up KM tools in an organisation and the technology experts who are familiar with the technology underlying the KM tools. Thus, the conference will make sure that those people who will actually establish the KM technology in an organisation become fully aware of its potential and get a clear idea what is currently possible (and what is not). Participants of Track 2 will bring back into their organisations detailed and well-founded information about KM and its feasibility. Thus, the information that can be obtained by the participants of Track 1 and Track 2 is complementary to each other. The main focus of Track 2 will be on workshops. Workshops: Each workshop is dedicated to a specific subarea considered relevant for KM. Contributions are sought that show how a certain approach solves a clearly outlined, practical problem. The reviewing process will take care that only those contributions will be accepted which give clear statements about the main issues and are comprehensible by those participants who are no experts in the field. To avoid toy problems and purely academic approaches being discussed in a workshop, every paper must clearly describe the (real-world) problem being tackled and what the added value of solving that problem is. The call for papers for a workshop should contain a description of two or more representative, practical problems which ought to be used by the authors of a workshop paper whenever the approach discussed permits this. Through the workshops we obtain a valuable compilation of what is currently possible and what not, what is easy and what is still difficult, and what is out of reach. Technology experts (and via them, decision makers) will thus obtain more information about how to design and install efficient KM at her or his company. A considerable part of the workshop time will be assigned to compare the presented approaches. This will be of great value from an application point of view as well as a scientific viewpoint. Exhibition and Demonstrations: There will be demonstrations of products as well as of running but not yet commercial systems which can be useful as tools for various aspects of KM. CALL FOR WORKSHOP PROPOSALS: We are looking for people with strong theoretical and practical background willing to organise a workshop on a topic relevant to the overall conference theme of knowledge management technologies. In order to share the work and to cover a wider range of practical and theoretical aspects it might be useful to have two organisers per workshop. Thus, feel free to submit a proposal together with a colleague of yours. Workshop format: To help workshop contributors find practically relevant problems and to achieve a high degree of consistency of the contributions, each workshop will send out a call for papers with a description of two or more representative, concrete tasks. Each task covers a real-world problem being related to knowledge management, and is formulated in a way that allows a great variety of solutions. Papers are sought which describe how one (or more) of these tasks can be solved by a certain approach. The description should be detailed enough for participants who are no experts in the field to understand the central points. It may be the case that a certain approach cannot be applied to one of the exemplary tasks formulated in the call for papers because the approach is designed to take advantage of the properties of a certain subclass of tasks and these properties are not given with the exemplary tasks. In such a case a paper may introduce its own problem as long as it is a practically relevant one. Workshop contributions may either present an innovative approach, or an innovative application of existing technology, or describe an impressive success story. Evaluation criteria will be: - practical relevance (of the problem dealt with and the added value achieved) - originality of the approach and/or solution - clarity of argumentation (why and how does the presented approach solve the selected task?) - how comprehensible is the contribution by participants who are no experts in the field? For the conference proceedings a full paper with a maximum of 12 pages (A4 .....) must be provided. The best papers will be selected for a book on the state of the art in Knowledge Management. An award will be given to the most significant workshop contribution. SUBMISSION DETAILS FOR WORKSHOP PROPOSALS: Please send AT ONCE: - a short confirmation that you plan to submit a workshop proposal UNTIL FEBRUARY 1, 1996: - a short description (1-2 pages) which shows the theoretical and practical aspects of your work and illustrates how it contributes to practical aspects of knowledge management and organisational change - a publication list, indicating your most relevant papers - a workshop outline (3-5 pages) which * points out the area of KM technologies the workshop will deal with * indicates the expected benefits of applying the technology in an organisation * comprises a description of at least two representative, practically relevant tasks TO THE FOLLOWING ADDRESS: Ulrich Reimer email: reimer@swssai.uu.ch Rentenanstalt / Swiss Life Informatik-Forschungsgruppe Tel.: +41-1-7114061 Postfach Fax: +41-1-7115007 CH-8022 Zuerich, Switzerland You are invited to contact the above address for any kinds of questions and for further clarifications. A web page is currently being set up. Possible workshop themes are (feel free to create others): - Knowledge Retrieval (automatic content analysis, information filters, supporting the user's query formulation, search aids, automatic information condensation, ...) - Knowledge Integration: integrating different, heterogeneous sources of knowledge into one information system - Data-Mining: automated extraction of knowledge from data - Supporting business process reengineering by knowledge-based tools - Using knowledge-based systems to enhance business processes - Cooperation of intelligent agents (supporting distributed human/machine problem solving, workflow management, ...) ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests and submissions to: NCGUR@UCCMVSA.UCOP.EDU Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu Nancy Gusack ncgur@uccmvsa.ucop.edu The IRLIST Archives is now set up for anonymous FTP, as well as via the LISTSERV. Using anonymous FTP via the host dla.ucop.edu, the files will be found in the directory pub/irl, stored in subdirectories by year (e.g., /pub/irl/1993). Using LISTSERV, send the message INDEX IR-L to LISTSERV@UCOP.EDU. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.