IRLIST Digest ISSN 1064-6965 December 1, 1992 Volume IX, Number 44 Issue 140 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. Text Retrieval Conference B. Publications Announcements 1. Computer Clubs II. QUERIES B. Requests for Information 1. Voice Recognition Theory 2. Compiler Information 3. Human-Computer Interaction IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.A.1. Fr: Donna Harman X3569 Re: Text Retrieval Conference TEXT RETRIEVAL CONFERENCE January 1993 - August 1993 Conducted by: National Institute of Standards and Technology (NIST) Sponsored by: Defense Advanced Research Projects Agency Software and Intelligent Systems Technology Office (DARPA/SISTO) A new conference for examination of text retrieval methodologies (TREC) was held in November 1992 at Gaithersburg, Md. The goal of this conference was to encourage research in text retrieval from large document collections by providing a large test collection, uniform scoring procedures and a forum for organizations interested in comparing their results. Both ad-hoc queries against archival data collections and routing (filtering or dissemination) queries against incoming data streams was tested. The conference was a workshop open only to the 24 participating systems and government sponsors; however, the proceedings will be published by NIST in the spring of 1993. This announcement serves as a call for participation from groups interested in participating in the second year of this workshop. Participants will be expected to work with approximately million documents (2 gigabytes of data), retrieving lists of documents that could be considered relevant to each of 100 topics (50 routing and 50 adhoc topics). NIST will distribute the data and will collect and analyze the results. As before, the workshop will be open only to participating systems and government sponsors. There will be some minimal support distributed to selected participants in an effort to maximize the number of participants and to attract the widest possible variety of technical approaches and system architectures. This funding is intended only as a supplement to other support. Non-U.S. as well as U.S. participants are eligible for this funding. Schedule: Dec. 5, 1992 -- deadline for applications,including funding requests Jan. 1, 1993 -- acceptances announced, and training data distributed to new participants (including 2 CD-ROMS containing about 2 gigabytes of data, and 100 training topics and relevance judgments) April 1, 1993 -- third gigabyte of data distributed via CD-ROM, after routing queries (see below) received at NIST May 15, 1993 -- 50 test topics distributed June 1, 1993 -- results from 50 routing queries and 50 test topics due at NIST July 30, 1993 -- relevance judgments and individual evaluation scores due back to participants Aug 30-Sept 1., 1993 -- TREC conference at NIST in Gaithersburg, Md. TASK DESCRIPTION: Participants will receive 2 gigabytes of data to use for training of their systems, including development of appropriate algorithms or knowledge bases. The 100 topics used in the first TREC conference, and the relevance judgments for these topics will also be sent. The topics are in the form of a highly-formatted user need statement (see attachment 1). Queries can either be constructed automatically from this topic description, or can be manually constructed. Participants are strongly encouraged to submit at least one run where queries are automatically constructed. Two types of retrieval operations will be tested: a routing or filtering operation against new data, and an ad-hoc query operation against archival data. Fifty of the topics (numbers 51-100) initially distributed as training topics will be used by each participating group to create formalized routing or filtering queries to be used for retrieval against a third gigabyte of data. Fifty new test topics will be used against the 2 gigabytes of training data as ad-hoc queries. Results from both types of queries (routing and ad-hoc) will be submitted to NIST as the top X documents (X to be determined at a later date) retrieved for each query. Participants creating queries both automatically and manually may submit both sets for evaluation. Scoring techniques including traditional recall/precision measures will be run for all systems and individual results will be returned to each participant. CONFERENCE FORMAT: The conference itself will be used as a forum both for presentation of results (including failure analyses and system comparisons), and for more lengthy system presentations describing retrieval techniques used, experiments run using the data, and other issues of interest to researchers in information retrieval. As there is a limited amount of time for these presentations, the program committee will determine which groups are asked to speak and which groups will present in a poster session. Additionally some organizations may not wish to describe their proprietary algorithms, and these groups may chose to participate in a different manner (see Category C). To allow a maximum number of participants, the following three categories have been established. CATEGORY A: FULL PARTICIPATION: Participants will be expected to work with the full data set, and to present full details of system algorithms and various experiments run using the data, either in a talk or in a poster session. In addition to algorithms and experiments, some information on time and effort statistics should be provided. This includes time for data preparation (such as indexing, building a manual thesaurus, building a knowledge base), time for construction of manual queries, query execution time, etc. More details on the desired content of the presentation will be provided later. CATEGORY B: EXPLORATORY GROUPS: Because small groups with novel retrieval techniques might like to participate but may have limited research resources, a category has been set up to work with only a subset of the data. This subset (see data description below), will consist of about 1/2 gigabyte of training data (and all training topics), and 1/4 gigabyte of test data (and all test topics). Participants in this category will be expected to follow the same schedule as category A, except with less data, and will be expected to present full details of system algorithms, experiments, and time and effort statistics either in a poster session or in a talk. Category C: Evaluation only Participants in this category will be expected to work on the full data set, submit results for common scoring and tabulation, and present their results in a poster session, including the time and effort statistics described in Category A. They will not be expected to describe their systems in detail. It is not anticipated that any supplemental funding will be available for this category. DATA (TEST COLLECTION): The test collection (documents, topics, and relevance judgments) will be the same collection (English only) being used for the DARPA TIPSTER project. The collection is being assembled from Linguistic Data Consortium text, and a LDC User Agreement will be required from all participants. The documents will be an assorted collection of newspapers (including the Wall Street Journal), newswires, journals, technical abstracts and email newsgroups. The test set will be of approximately the same composition as the training set, and all documents will be typical of those seen in a real-world situation (i.e. there will not be arcane vocabulary, but there may be missing pieces of text or typographical errors). The format of the documents is relatively clean and easy-to-use as is (see attachment 2). Most of the documents will consist of a text section only, with no titles or other categories. The relevance judgments against which each system's output will be scored will be made by experienced relevance assessors based on the output of all TREC participants using a pooled relevance methodology. RESPONSE FORMAT AND SUBMISSION DETAILS: By Dec. 5, 1992 organizations wishing to participate should respond to the call for participation by submitting a summary of their text retrieval approach and a system architecture description, not to exceed five pages in total. The summary should include the strengths and significance of their approach to text retrieval, and highlight differences between their approach and other retrieval approaches. These summaries will serve as the basis for published proceedings. Opportunity to revise the summaries and add explanations of the results will be provided before publication. Each organization should indicate in which category they wish to participate. Please indicate clearly the persons responsible for the summary statement and to whom correspondence should be directed. A full regular address, telephone number, and an email address should be given. EMAIL IS THE PREFERRED METHOD OF COMMUNICATION, although it is realized that diagrams and figures will need to be sent by regular mail or FAX. It is expected that ALL participants have some access to email, as conference communications will be done via email. Those organizations wishing to apply for funding to supplement their own resources must provide a second statement (not to exceed two pages). This statement should include an estimate of the amount of funding available from other sources to support participation in this work, and a specification of the amount of funding desired. Please clearly indicate whether the organization is interested in participating in TREC even if no funding is available. All responses should be submitted by Dec. 5, 1992 to the Program Chair, Donna Harman: harman@magi.ncsl.nist.gov Donna Harman, NIST, Building 225/A216, Gaithersburg, Md. 20899 FAX: 301-975-2128 AS NOTED ABOVE, EMAIL IS THE DESIRED FORM OF COMMUNICATION. ********** I.B.1. Fr: Erna.Gumuliauskaite@IF.KTU.lt Re: Computer Clubs A computer club founded by Kaunas Technological University and Kaunas Magnus Dux University in Lithuania are seeking contacts with other computer clubs. Our main interests are object-oriented programming and related topics. Please contact Erinija Erna.Gumuliauskaite@IF.KTU.LT ********************************************************** II. QUERIES II.B.1. Fr: Jefferey Lundstrom Re: Voice Recognition Theory Does anyone out there have any information on Computer Voice recognition Theory? I have a friend that is writing a paper on it and she needs some info! Please let me know if you can help!!! Thanks in advance!!! Jeff Lundstrom ********** II.B.3. Fr: Nelliud Torres Re: Compiler question I am an Operating System Specialist in the University of Puerto Rico. We use VAX/VMS in our computer center. VAX computers use Command Procedures (DCL) which are operating system instructions (same as Batch programs in PC). Once the Command Procedure is running, the computer first checks any syntax error and then executes it. I want to be able to validate the syntax of every instruction before the computer executes the DCL program. The only language I have available to do that is COBOL. My intentions are to create a program that validates the syntax and maybe translates the instructions to BASIC Language instructions if it is posible. I am interested in any information reguarding methods, data structures (Stack, Queues, Multilist, etc.), possible considerations to take, books related to the topic, articles, etc. Any information will be appreciated. Thank you very much. Nelliud D. Torres ********** II.B.4. Fr: Teri O'Connell Re: Human-Computer Interfaces Does anyone have any information on designing human-computer interfaces to information retrieval systems for persons with special needs? By special needs, I primarily mean physical handicaps, for example, poor eyesight. However, I would also appreciate information on designing for the mentally handicapped. I am currently focusing on the design of graphical user interfaces. Any pointers, references, etc. would be most appreciated. Thanks! Teri O'Connell oconnell_teri@po.gis.prc.com ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG92-01569. AU RAO, USHA. TI HYPERTEXT FUNCTIONALITY: A THEORETICAL FRAMEWORK. IN Rutgers The State University of New Jersey - Newark Ph.D. 1991 281 pages. SO DAI V52(11), SecA, pp3756. DE Information Science. Computer Science. AB Design metaphors for Hypertext systems are quite diverse. On the one hand, we have Hypertext systems that utilize general and unspecific metaphors, resulting in problems due to the ambiguity of the human language. On the other hand, the majority of Hypertext systems are application-dependent and have been tailored to very specific situations. With this latter approach, individuals and organizations will find themselves in a position of having to utilize different Hypertext systems to accomplish different tasks. Hypertext should be treated as a general-purpose tool with approaches to handling nodes, link and retrieval, that fits within the context of any application and conveys common meanings to the users. To accomplish this we need a comprehensive framework for Hypertext based on a cognitive model that allows for the representation of the complete range of human intellectual abilities. A theoretical framework useful for understanding the functionality of Hypertext systems in terms of their ability to satisfy cognitive requirements is proposed. This framework, derived from a re-interpretation of Guilford's Structure of Intellect model, consists of six generic node types and twelve generic link types. Sixteen diverse Hypertext systems were reviewed to test whether the current range of applications could be classified within the proposed framework. The results obtained support our claim that the node and link types in any current or proposed Hypertext system can be mapped into this morphology. The six node and twelve link types were tested for both exhaustiveness and consistency. Node testing consisted of subjects classifying a given node into its appropriate category. Link testing consisted of subjects classifying a link type between a given pair of node types. In both cases the exhaustiveness hypothesis was upheld. In most cases the consistency hypothesis was upheld. In situations giving rise to confusion in the classification scheme, rules and procedures were developed to resolve the confusion. While usability was not a direct objective, the results of the experiment give every indication that the morphology is directly usable. AN University Microfilms Order Number ADGNN-59811. AU HOWARTH, LYNNE CHRISTINE. TI THE IMPACT OF AUTOMATION ON OPERATIONS AND STAFFING CONFIGURATIONS IN CATALOGUING DEPARTMENTS IN PUBLIC LIBRARIES: A STUDY OF FOUR PUBLIC LIBRARY SYSTEMS IN THE MUNICIPALITY OF METROPOLITAN TORONTO, ONTARIO, 1970-1986. IN University of Toronto (Canada) Ph.D. 1990, 258 pages. SO DAI V52(11), SecA, pp3758. DE Library Science. IS ISBN: 0-315-59811-5. AB There has been little systematic, empirical research assessing the extent to which automation in public libraries has met managerial goals for improving productivity, decreasing staff, and lowering operating costs. This study investigated two research questions in this regard. First, did subscribing to a bibliographic utility for cataloguing support and/or implementing an in-house automated circulation system with an in-house bibliographic data base result in (1) an increase in productivity vis a vis numbers of titles catalogued and volumes processed, (2) a decrease in the number of cataloguing department staff, and (3) a reduction in the rate of rise of cataloguing departmental costs when compared with the pre-automation period. Second, what changes in productivity, staffing, and cost variables occurred when tasks associated with implementing an automated circulation system were added to the workflow of a cataloguing department already subscribing to the products and services of a bibliographic utility. A retrospective, longitudinal study of cataloguing departments of four public libraries in Metropolitan Toronto, Ontario, from 1970 to 1986, was conducted. Data describing numbers of titles catalogued and volumes processed, numbers of staff, and cataloguing expenses were collected from reports maintained by each library, verified through follow-up interviews with cataloguing managers, and subsequently analysed. Results suggested that managerial goals were more successfully achieved during both periods of automation as compared to the manual environment. Comparing the automation periods of subscribing to a bibliographic utility, and implementing an automated circulation system, managerial goals were realized to a greater extent during the latter period. During the period associated with subscribing to a bibliographic utility, productivity (titles catalogued) rose, though not significantly, for three of four libraries, rates of rise of departmental costs stabilized in two libraries, and numbers of staff increased in all libraries. During the period concurrent with implementing an automated circulation system, two of the three observed increases in productivity (titles catalogued) were significant, rates of rise of departmental costs declined in two libraries, and numbers of staff decreased or virtually stabilized in all libraries. These findings and their implications were discussed. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch calur@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. You will receive the issues for the entire month you have requested. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. THE OPINIONS EXPRESSED IN IRLIST DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF CALIFORNIA. AUTHORS ASSUME FULL RESPONSIBILITY FOR THE CONTENTS OF THEIR SUBMISSIONS TO IRLIST.