Index to Program (ASCII Version of 2-May-97) [Search on these *'d strings to reach the appropriate parts of the program. The material in this online program is identical to that in the printed program.] 1. *Cover* : Cover page. Corporate sponsors. 2. *Introduction*: Description of conference. Registration times. Home page. 3. *Personnel*: Organizing committee and program committee. 4. *Program* : Timetable of events and presentations. 5. *Location* : Geography and transportation details. 6. *Registration* : Registration details and registration form. 7. *Tutorials* : Descriptions of tutorials. 8. *Workshops* : Descriptions of workshops. --------------------------------------------------------- *Cover* 20th International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 97 DoubleTree Hotel, Philadelphia, PA, USA July 27 -- July 31, 1997 ADVANCED PROGRAM Sponsored by ACM. In co-operation with: BCS-IRSG (UK), GI (Germany), IPSJ (Japan). --------------------------------------------------------- --------------------------------------------------------- The sponsorship of the following companies is gratefully acknowledged AT&T Labs - Research IBM Infonautics Institute for Scientific Information (ISI) Lexis-Nexis West Information Publishing Group --------------------------------------------------------- --------------------------------------------------------- *Introduction* SIGIR '97 is the twentieth conference in the premier series of research conferences on information retrieval. SIGIR is the major forum for the presentation of new research results, and for the demonstration of new systems and techniques, in information retrieval. The conference attracts a broad range of professionals including theoreticians, developers, publishers, researchers, educators, and designers of systems, interfaces, information bases, and related applications. In 1997, SIGIR is collocated with DL '97, the Second ACM International Conference on Digital Libraries, which will be held July 23-26, 1997 in Philadelphia. The SIGIR '97 home page is: http://www.acm.org/sigir/conferences/sigir97/index.html SIGIR '97 Registration Times: Sunday, July 27: 8:00AM - 6:30PM Monday, July 28: 7:30AM - 12:00PM Tuesday, July 29: 8:30AM - 10:00AM Wednesday, July 30: 8:30AM - 10:00AM Thursday, July 31: 8:30AM - 9:30AM --------------------------------------------------------- *Personnel* CONFERENCE CHAIR Ellen Voorhees NIST; Building 225 / Room A-216 Gaithersburg, MD 20899, USA sigir97@nist.gov Ph. +1 301 975-3761; Fax: +1 301 840-1357 PROGRAM CHAIRS For North and South America: Nicholas J. Belkin (Rutgers Univ.); belkin@scils.rutgers.edu For Europe and Africa: Peter Willett (Univ. of Sheffield) ; p.willett@sheffield.ac.uk For Asia and the Pacific: Arcot Desai Narasimhalu (National Univ. of Singapore); desai@iss.nus.sg ORGANIZING COMMITTEE Treasurer: Paul B. Kantor (Rutgers Univ.); kantorp@cs.rutgers.edu Tutorials & Panels Chair: Susan Dumais (Bellcore); std@bellcore.com Workshops Chair: James Allan (Univ. of Massachusetts); allan@cs.umass.edu Posters Chair: K. L. Kwok (CUNY); kwok@post.cs.qc.edu Demonstrations Chair: Chris Buckley (Sabir Research); chrisb@sabir.com Sponsorship Chair: Linda L. Hill (UCSB); lhill@alexandria.ucsb.edu Volunteers Chair: Eric Brown (IBM); brown@watson.ibm.com Publicity Chair: David D. Lewis (AT&T); lewis@research.att.com ----------------- PROGRAM COMMITTEE IJsbrand Jan Aalbersberg, Phillips, The Netherlands Maristella Agosti, Univ. of Padua, Italy Micheline Beaulieu, City Univ., UK Peter Bruza, QUT, Australia Chris Buckley, Cornell Univ., USA Forbes Burkowski, Univ. of Waterloo, Canada James Callan, Univ. of Massachusetts, USA Raman Chandrasekar, NCST, India Yves Chiaramella, CLIPS-IMAG, France Hsinchun Chen, Univ. of Arizona, USA Mark Chignell, Univ. of Toronto, Canada Ken Church, AT&T, USA W. Bruce Croft, Univ. of Massachusetts, USA Susan Dumais, Bellcore, USA Leo Egghe, Limburgs Univ. Centrum, Belgium David Ellis, Univ. of Sheffield, UK Ed Fox, Virginia Tech, USA Jim French, Univ. of Virginia, USA Hans-Peter Frei, UBILAB, Switzerland Norbert Fuhr, Univ. Dortmund, Germany Gregory Grefenstette, Rank Xerox, France Donna Harman, NIST, USA David Harper, Robert Gordon Univ., UK Marti Hearst, Xerox, USA Bill Hersh, Oregon Health Sciences Univ., USA Haym Hirsh, Rutgers Univ., USA David Hull, Rank Xerox, France Peter Ingwersen, Royal School of Librarianship, Denmark Tetsuya Ishikawa, Univ. of Library and Info. Sci., Japan Kalervo Jarvelin, University of Tampere, Finland Paul Kantor, Rutgers University, USA Haruo Kimoto, NTT, Japan Judith Klavans, Columbia University ,USA Shmuel Klein, Bar-Ilan Univ., Israel Robert Korfhage, Univ. of Pittsburgh, USA Robert Krovetz, NEC, USA K. L. Kwok, Queens College, CUNY, USA Dik Lee, HKUST, Hong Kong Joon Ho Lee, KRDIC, Korea David Lewis, AT&T, USA Elizabeth Liddy, Syracuse Univ., USA Dario Lucarella, CRA-ENEL, Italy Kathy McKeowan, Columbia Univ., USA Elke Mittendorf, ETH Zentrum, Switzerland Alistair Moffat, Univ. of Melbourne, Australia Sung Hyun Myaeng, Chungnam National Univ., Korea Jan Pedersen, Verity, USA Annelise Pejtersen, National Laboratory, Denmark Keith van Rijsbergen, Glasgow University, UK Ellen Riloff, Univ. of Utah, USA Stephen Robertson, City Univ., UK Airi Salminen, Univ. of Jyvaskyla, Finland Tefko Saracevic, Rutgers Univ., USA Peter Schauble, ETH Zentrum, Switzerland Fabrizio Sebastiani, IEI-CHR, Italy Alan Smeaton, Dublin City Univ., Ireland Phil Smith, Ohio State Univ., USA Craig Stanfill, Ab Initio, USA Ulrich Thiel, GMD IPSI, Germany Richard Tong, Sageware, USA Howard Turtle, West Info. Pub. Gp., USA Ross Wilkinson, RMIT, Australia Mei-Mei Wu, National Taiwan Normal Univ., Taiwan Emannuel Yannakoudakis, Athens Univ. of Economics, Greece --------------------------------------------------------- --------------------------------------------------------- *Program* SIGIR '97 Program Saturday, July 26 9:00AM-4:00PM Pre-Conference Workshop ----------------- Sunday, July 27 8:30AM-12:30PM Morning Tutorials 1:30PM-5:30PM Afternoon Tutorials 6:00PM-8:00PM Salton Award Presentation and Lecture, to be followed by a celebratory Reception. ----------------- Monday, July 28 7:30AM-8:30AM Newcomers' breakfast 9:00AM-10:30AM Session 1: Opening Session Welcome: Ellen Voorhees, National Institute of Standards and Technology, USA Keynote Address: George Miller, Princeton University 10:30AM-11:00AM Break 11:00AM-12:30PM Session 2: Relevance Feedback Session Chair: Sung Hyun Myaeng, Chungnam National University Fast and Effective Query Refinement B. Velez, R. Weiss, M. A. Sheldon*, D. K. Gifford MIT; *Lotus Development Corporation On Relevance Weights with Little Relevance Information S. Robertson, S. Walker City Univ., London Learning Routing Queries in a Query Zone A. Singhal, *M. Mitra, **C. Buckley AT&T Labs; *Cornell Univ.; **Sabir Research 12:30PM-2:00PM Lunch 2:00PM-3:30PM Chinese Language Retrieval Session Chair: Mei-Mei Wu, National Taiwan Normal University Comparing Representations in Chinese Information Retrieval K.L. Kwok Queens College, CUNY Chinese Text Retrieval Without Using a Dictionary A. Chen, J. He, L. Xu, F. C. Gey, J. Meggs Univ. of California, Berkeley Pat-tree-based Keyword Extraction for Chinese Information Retrieval *L-F. Chien, *,**T-I. Huang, *M-C. Chien *Academia Sinica; **Univ. of Southern California 3:30PM-4:00PM Break 4:00PM-5:30PM Session 3: Classification Methods Session Chair: David Lewis, AT&T Labs Almost-Constant-Time Clustering of Arbitrary Corpus Subsets C. Silverstein, *J. O. Pedersen Stanford Univ.; *Verity Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization H.-T. Ng, *W.-B. Goh, *K.-L. Low Defence Science Organization, Singapore; *Ministry of Defence, Singapore A Comparison of Projections for Efficient Document Clustering H. Schuetze, C. Silverstein Xerox PARC 6:00PM-9:00PM Demonstrations and posters session, with reception ----------------- Tuesday, July 29 9:00AM-10:30AM Session 4: Cross-Language Retrieval Phrasal Translation and Query Expansion Techniques for Cross-language Information Retrieval L. Ballesteros, W. B. Croft Univ. of Massachusetts, Amherst QUILT: Implementing a Large-scale Cross-language Text Retrieval System M. Davis, W. C. Ogden NMSU Cross Language Speech Retrieval P. Sheridan, M. Wechsler, P. Schauble ETH, Zurich 10:30AM-11:00AM Break 11:00AM-12:30PM Session 5: **Parallel Sessions** Session 5.1: Formal Models Session Chair: Norbert Fuhr, University of Dortmund Dempster-Shafer's Theory of Evidence Applied to Structured Documents: Modelling Uncertainty M. Lalmas Univ. of Glasgow Computationally Tractable Probabilistic Modeling of Boolean Operators W. R. Greiff, W. B. Croft, *H. Turtle Univ. of Massachusetts; *West Publishing Co. A Method for Monolingual Thesauri Merging M. Sintichakis, P. Constantopoulos Univ. of Crete Session 5.2: Natural Language Processing Session Chair: Kathy McKeowan, Columbia University Textual Context Analysis for Information Retrieval M. A. Stairmand UMIST Effective Use of Natural Language Processing Techniques for Automatic Conflation of Multi-Word Terms: The Role of Derivational Morphology, Part of Speech Tagging, and Shallow Parsing J. L. Klavans, *E. Tzoukermann, **C. Jacquemin Columbia Univ.; *Bell Labs; **IRIN/IUT Guessing Morphology from Terms and Corpora C. Jacquemin IRIN/IUT 12:30PM-2:00PM Lunch 2:00PM-3:30PM Session 6: Text Structures Session Chair: Peter Ingwersen, Royal School of Librarianship Optimal Demand-oriented Topology for Hypertext Systems S. Aaronson Clarkson Univ. Passage Retrieval Revisited M. Kaszkiel, J. Zobel RMIT Exploration of Text Collections with Hierarchical Feature Maps D. Merkl RMIT 3:30PM-4:00PM Break 4:00PM-4:15PM Presentations 4:15PM-5:30PM Session 7 (Panel Session) Real Life Information Retrieval: Commercial Search Engines Moderator: Mike Lesk, Bellcore Panelists: Doug Cutting, Excite Jan Pedersen, Verity Terry Noreault, OCLC Matt Koll, PLS 6:00PM Conference Banquet at Franklin Institute Science Museum (Take your banquet ticket!) ----------------- Wednesday, July 30 9:00AM-10:00AM Session 8: **Parallel Sessions** Session 8.1: User Issues 1 Session Chair: Tefko Saracevic, Rutgers University Users' Perception of the Performance of a Filtering System R. Fidel, *M. Crandall Univ. of Washington; *Boeing Time, Relevance and Interaction Modelling for Information Retrieval M. D. Dunlop Univ. of Glasgow Session 8.2: Asian Languages Session Chair: Haruo Kimoto, NTT How to Read Less and Know More - Approximate OCR for Thai D. Cooper Southeast Asian Software Research Center Overlapping Statistical Word Indexing: A New Indexing Method for Japanese Text Y. Ogawa, T. Matsuda Ricoh 10:00AM-10:30AM Break 10:30AM-11:30AM Session 9: **Parallel Sessions** Session 9.1: User Issues 2 Session Chair: Mun Kew Leong, Institute of Systems Science, National University of Singapore Effectiveness of a Graphical Display of Retrieval Results A. Veerasamy, Heikes Georgia Tech Cat-a-Cone: An Interactive Interface for Specifying Searches and Viewing Retrieval Results using a Large Category Hierarchy M. A. Hearst, *C. Karadi Xerox PARC; Stanford Univ. Session 9.2: Combination Techniques Session Chair: Paul Kantor, Rutgers University A Probabilistic Model for Distributed Information Retrieval C. Baumgarten Dresden Univ. of Technology Analyses of Multiple Evidence Combination J. H. Lee Korea Research and Development Information Center 11:30AM-1:30PM ACM SIGIR Annual Meeting and Lunch 1:30PM-3:00PM Session 10: Image Retrieval Session Chair: Ed Fox, Virginia Tech Image retrieval by appearance S. Ravela, R. Manmatha Univ. of Massachusetts, Amherst Using Semantic Contents and WordNet in Image Retrieval Y. A. Aslandogan, *C. Their., C. T. Yu, J. Zou, **N. Rishe Univ. of Illinois at Chicago; *Tribune Media Services; **Florida Intl. Univ. Image Retrieval by Hypertext Links V. Harmandas, M. Sanderson, M. D. Dunlop. Univ. of Glasgow 3:00PM-3:30PM Break 3:30PM-5:00PM Session 11: Query Expansion Session Chair: Ross Wilkinson, RMIT Automatic Feedback Using Past Queries: Social Searching? L. Fitzpatrick, M. Dent Open Text Corp. Exploiting Clustering and Phrases for Context-based Information Retrieval P. Anick, S. Vaithyanathan DEC; IBM The Potential and Actual Effectiveness of Interactive Query Expansion M. Magennis & C.J. van Rijsbergen Univ. of Glasgow 5:00PM CLOSE OF CONFERENCE ----------------- Thursday, July 31 9:00AM-3:00PM Post-conference Workshops -------------------------------------------------------- -------------------------------------------------------- *Location* Location The DoubleTree Hotel Philadelphia is located at the intersection of Broad and Locust Streets in downtown Philadelphia. It should not be confused with the Doubletree Club Hotel Philadelphia Northeast, or the Doubletree Guest Suites at Philadelphia International Airport. The DoubleTree Hotel Philadelphia features a pool and fitness center. It is located in the heart of center city Philadelphia, putting it within a few blocks of hundreds of restaurants and shops, as well as a variety of major historic and cultural attractions. Attractions of particular note include Independence Hall, the Liberty Bell, and other sites associated with the American Revolution and the early history of the United States. The Philadelphia Museum of Art is one of the country's finest art museums. Transportation Philadelphia is easily accessible by air, rail, and automobile. Philadelphia International Airport is connected to downtown Philadelphia by SEPTA (commuter rail) line R1 (get off at Market East Station, a few blocks from the hotel), as well as by car or taxi. 30th Street Station, just west of center city, is a major national rail station. Any of several SEPTA lines will get you from there to Market East Station, or you can take a taxi to the hotel. Further information on transportation, as well as attractions, can be found at http://www.libertynet.org/phila-visitor/, or by contacting the Philadelphia Convention and Visitors Bureau, ph. +1-800-537-7676, tourism@libertynet.org. ------------------------------------------------------------------ ------------------------------------------------------------------ *Registration* Registration Information Full conference registration (including members of cooperating organizations and non-members) includes attendance at all technical sessions, Proceedings, conference banquet, lunch at the SIGIR Annual General Meeting, and two receptions. Tutorials and workshops require a separate fee. Additional copies of the Proceedings at $50.00 and the tutorial notes will be on sale at the conference. The ACM member rate is available to members of ACM, SIGIR, BCS-IRSG, GI, and IPSJ. The student rate is available to full-time students and only when sufficient evidence is submitted along with the registration form. (For information on joining ACM and/or SIGIR, contact ACM by phone at +1-212-626-0500, by fax at +1-212-944-1318, by email at acmhelp@acm.org, or see their web page at http://www.acm.org/membership/join.html.) All payments must be made by check in U. S. funds or by VISA, MasterCard, or American Express. Cancellations must be received at the McLean address in writing, postmarked by July 11, 1997. Refunds will be made on cancellations marked by July 11, less a cancellation fee of $25.00. Hotel reservations should be made directly with the DoubleTree Hotel Philadelphia, ph. +1-215-893-1600, fax. +1-215-893-1663. The SIGIR conference rate of $109/night plus tax is available for reservations made by June 30, 1997. Further information on Philadelphia area hotels can be found at http://www.libertynet.org/phila-visitor/, or by contacting the Philadelphia Convention and Visitors Bureau, ph. +1-800-537-7676, tourism@libertynet.org. ______________________________________________________________________ SIGIR 97 Registration Form The 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval Philadelphia, PA, July 27-31, 1997 Please use block letters or type, and check where appropriate Mr.___ Ms.___ Dr.___ Prof. ___ Other _____________ Last Name_______________________ First Name________________________ Badge Name (if different)__________________________________________ Company/Organization_______________________________________________ Address____________________________________________________________ ____________________________________________________________ City___________________________ State_______ Postal Code___________ Country__________________________ Phone__________________________ Fax_______________________________ E-mail_____________________________________________________________ Check here if this is your first SIGIR conference: ________________ (A Newcomers' breakfast will be held.) **Conference Registration Fees Member of ACM, BCS-IRSG, GI, or IPSJ: _____ Membership No.: __________________________ Non-member: _____ Full-time student (proof required): _____ One-day Registration: M T W (circle one) **Tutorials (check up to two): Multimedia Information Retrieval (morning) ______ Algorithmic and Cognitive Approaches for IR (morning) ______ IR Systems: Research and Design Methods (afternoon) ______ Implementation of High-performance IR Systems (afternoon) ______ Machine Learning for IR (afternoon) ______ Software Agents for IR (morning) ______ Evaluation of IR Systems (afternoon) ______ Cross-language IR (morning) ______ **Workshops Beyond Word Relations (July 31) ______ Networked Information Retrieval (July 31) ______ Summarization and Visualization (July 31) ______ Cross-Lingual IR (July 31) ______ Education and Curriculum Development (July 26) ______ **Special Events Extra banquet tickets (indicate how many) ______ Indicate here any special needs such as vegetarian meals. Please specify: __________________________________________________ __________________________________________________ **Payment Information Registration Amount $________ Tutorial Amount $________ Workshop Amount $________ Banquet Tickets $________ Total Amount Due $________ Early registrations must be postmarked by June 15 to receive the discounted fee. Registrations paid by credit card may be faxed or e-mailed. Checks should be made payable to SIGIR 97 and mailed to: SIGIR 97 P. O. Box 7270 McLean, VA 22106-7270 USA E-mail: 76631.1703@CompuServ.com Fax: +1 703-790-7237 **Credit Card Information: Card No __________________________________________ Expiration Date_____________________________________ Signature__________________________________________ Registrations must be postmarked by July 11, 1997. Please register on site after this date. Questions?? Call 703-356-8300 E-mail: 76631.1703@CompuServ.com For Office Use: Recd____________ Check No._______________ Amount__________________________________ ------------------------------------------------------------- SIGIR '97 Registration Fees To register, please complete the registration form found in the center of this booklet and return it to the address shown. To receive the early fee, registrations must be postmarked by June 15, 1997. Do not mail registrations after July 11, 1997. The following fees will apply: Registration Early Late Members of cooperating organizations $325 $400 Non-members $400 $450 Students $125 $150 One day $150 $150 Tutorials Members or non-members One tutorial $ 90 $115 Two tutorials $160 $200 Students One tutorial $ 75 $100 Two tutorials $100 $200 Workshops $ 50 $ 60 Additional Banquet Tickets $ 55 $ 55 There are opportunities for students to receive free registration in exchange for working at the conference. Contact the Volunteers Chair, Eric Brown (brown@watson.ibm.com) for details. ----------------- Conference Banquet The conference banquet will be held in the Franklin Institute Science Museum, at the intersection of 20th and Race Streets. In addition to housing the Benjamin Franklin National Memorial, the Museum offers a variety of hands-on science and technology exhibits. The banquet will offer a sampler of traditional foods from Philadelphia's varied neighborhoods. The admission to the conference banquet is included in the full conference fee, but is not included in the student fee. Additional banquet tickets may be purchased for $55. ---------------------------------------------------------------- ------------------------------------------------------------- *Tutorials* Pre-Conference Tutorials SIGIR tutorials provide an opportunity to learn the basics of information retrieval or to learn a new or specialized area from experts in the field. This year, eight half-day tutorials will be presented on Sunday, July 27, in parallel sessions during the morning and afternoon. Morning Tutorials Title: Algorithmic and Cognitive Approaches for Information Retrieval Instructors: Peter Ingwersen, Department of Information Retrieval Theory, Royal School of Librarianship Peter Willett, Department of Information Studies, University of Sheffield Time: Morning (8:30AM-12:30PM) Course Description: This tutorial will start with an introduction to IR systems and will discuss their principal components, such as documents, queries and relevance assessments, inter alia. It will then summarise the main features of algorithmic and cognitive approaches to IR, thus providing attendees with background for the research presentations later in the conference. The algorithmic area focuses principally on the algorithms and data structures that are needed to maximise retrieval effectiveness whilst maintaining a reasonable level of retrieval efficiency. The cognitive section summarises a range of communicative and psycho-sociological studies of IR systems focusing on user-centered approaches to information systems design. Copies of the transparences used, as well as a Tutorial Note (30 pages) reviewing the core research literature, will be distributed. Who should attend: Introductory. About the Instructors: Following doctoral and post-doctoral research on computer techniques for the processing of databases of chemical reactions, Peter Willett joined the staff of the Department of Information Studies, University of Sheffield as a Lecturer in Information Science in 1979 and was awarded a Personal Chair in 1991. He is a Fellow of the Institute of Information Scientists, and was the 1993 recipient of the Skolnik Award of the American Chemical Society for his contributions to chemical information science. Professor Willett heads a research group studying computational techniques for the storage and retrieval of information in textual, chemical and biochemical databases and has over two hundred publications describing this work. His current interests include the development of non-Boolean searching techniques for textual databases, chemical structure-property correlation in drug development programmes, and the processing of 3-D chemical structure data to support research in drug discovery and protein engineering. Peter Ingwersen became lecturer at the Royal School of Librarianship, Copenhagen in 1973. Since July 1993 he has been Head of the Department of Information Retrieval Theory. In 1991 he obtained his Ph.D. degree from Copenhagen Business School, Faculty of Economics, Institute of Informatics and Management. He has lectured on information storage and retrieval, cataloguing and indexing theory and carried out experimental research on cognitive psychological aspects of processes concerned with user-intermediary-system interaction (HCI), and specialized information services and systems for industry. He has served in several Esprit projects as expert consultant and reviewer. As Conference Chairman he organized the 15th ACM-SIGIR Meeting in 1992, and the Second Conference on Conceptions of Library and Information Science (CoLIS 2) in Copenhagen, 1996. ----------------- Multimedia Information Retrieval Instructor: Norbert Fuhr, University of Dortmund Time: Morning (8:30AM-12:30PM) Course Description: The aim of this tutorial is to survey the state of the art in multimedia IR. The focus is on indexing and retrieval methods for multimedia, whereas system-oriented aspects will not be addressed. More specifically, the following major concepts are to be taught in the tutorial: o basic properties of text, images, audio, video o views on media objects: physical (layout), structural (logical), symbolic, spatial, temporal, perceptive o modelling the structure of multimedia documents o feature-based and semantic indexing methods for text, images, speech, video o multimedia retrieval: classical IR models vs. logic-based approaches o retrieval of structured documents. Participants will receive copies of the slides plus a reading list with comments. Additional information can be found at the Web page of the MMIS course at the University of Dortmund: http://ls6-www.informatik.uni-dortmund.de/ir/teaching/courses/mmis/. Who should attend: Introductory. Attendees should have basic knowledge in IR, but knowledge in multimedia systems is not required. About the Instructor: Norbert Fuhr is Professor in the Computer Science Department of the University of Dortmund, Germany. He is well-known for his theoretical and experimental work on probabilistic indexing and retrieval models. His current research interests are the integration of IR and database systems, networked IR systems and multimedia IR. Concerning the latter, his research group is participating in the joint European ESPRIT project FERMI (Formalization and Evaluation of Multimedia Information Retrieval, together with the University of Glasgow, the University of Grenoble and IEI-CNR in Pisa), where logic-based models for multimedia retrieval are developed. --------------- Title: Software Agents for Information Retrieval Instructors: Tim Finin, James Mayfield and Charles Nicholas University of Maryland Baltimore County Time: Morning (8:30AM-12:30PM) Description: This tutorial will provide an introduction to software agents and their potential applications in IR systems. The tutorial will be divided into three sections of roughly one hour each followed by a short conclusion. The first will present concepts which underly the software agents paradigm and illustrate them with a range of example applications. The second part will cover agent software architectures, agent communication languages, and cooperation protocols. The third segment will present examples of agent-based IR systems and discuss the techniques used in them. Course material will include copies of the slides and an annotated bibliography. Who should attend: Introductory About the Instructors: Dr. Timothy Finin is a Professor of Computer Science and Electrical Engineering at the University of Maryland Baltimore County. He has had over 25 years of experience in the applications of Artificial Intelligence to problems in database and knowledge base systems, intelligent information systems, expert systems, natural language processing, intelligent interfaces and robotics. He is currently working on the development of technology to support intelligent information agents. Prior to joining the UMBC, he was a Technical Director at the Unisys Center for Advanced Information Technology, a member of the faculty of the University of Pennsylvania, and on research staff of the MIT AI Lab. He holds a PhD in Computer Science from the University of Illinois. Finin is the author of over eighty research publications. He has been chair or program chair of several conferences in the area of intelligent systems and will serve as technical co-chair of Autonomous Agents-98. Dr. Charles Nicholas is an Associate Professor of Computer Science at the University of Maryland Baltimore County. He received a Ph.D. in Computer Science from The Ohio State University in 1988. He has been at UMBC since August 1988. Nicholas served as the general chair of the fourth and fifth ACM Conferences on Information and Knowledge Management and Co-Chair of the Principles of Document Processing Workshop. His areas of interest include information retrieval, electronic document processing, and software engineering. Dr. James Mayfield is an Associate Professor in the UMBC CSEE Department currently on leave at the Johns Hopkins Applied Physics Laboratory. He received a Ph.D. in Computer Science from the University of California at Berkeley in 1989. Mayfield's dissertation, which was part of the Unix Consultant project, explored how a consultant system can recognize the plans and goals of its users based on their English queries, so as to more effectively address their needs. Mayfield also has extensive research experience in developing applied natural language processing systems and participated in the third, fourth and fifth ARPA-supported Message Understanding Conferences (MUC3, MUC4 and MUC5). Mayfield has organized four workshops in the area of "Natural Language text Retrieval", "Intelligent Hypertext Systems" and "Intelligent Information Agents". ------------------ Cross-Language Information Retrieval Instructor: Dr. Douglas W. Oard, University of Maryland Time: Morning (8:30AM-12:30PM) Description: Cross-language information retrieval techniques offer important functionality to multilingual systems by allowing queries formed in a single language to be used over the entire collection. Cross-language retrieval systems also offer monolingual users the potential to limit the expenditure of expensive translation resources to potentially useful documents that are identified with a cross-language selection interface. The tutorial will begin with descriptions of cross-language text retrieval applications and some examples of deployed systems. The capabilities and limitations of controlled vocabulary techniques will be described and used to motivate the subsequent discussion of free text techniques. Current research on knowledge-based approaches that exploit dictionaries will be presented in detail and research on techniques based on more sophisticated ontologies will be discussed briefly. Multilingual corpora provide another important source of information on the relationship between languages, and techniques based on both parallel and comparable corpora will be presented in detail. A description of cross-language selection interfaces will complete the discussion of current research on cross-language text retrieval. The tutorial will conclude with a brief discussion of the potential application of these techniques to cross-language speech retrieval, identification of open research topics on cross-language retrieval, and a brief summary of sponsored research opportunities in the United States and the European Community. Participants will be provided with copies of the slides and web links to current research projects. Who should attend: Intermediate. The tutorial is designed for researchers and practitioners familiar with the design of at least one content-based text retrieval approach (e.g., the vector space model, a probabilistic model, a thesaurus-based controlled vocabulary technique). Implementation and evaluation experience are not required. About the instructor: Dr. Oard received his Ph.D. in Electrical Engineering from the University of Maryland in 1996, working on adaptive filtering techniques for multilingual document streams. He co-chaired the March 1997 AAAI Symposium on Cross-Language Text and Speech Retrieval and is the author of several papers on cross-language text retrieval and multilingual text filtering. Additional information is available at http://www.glue.umd.edu/~oard. ----------------- Afternoon Tutorials Information Retrieval Systems: Research and Design Methods Instructors: Raya Fidel, University of Washington Philip J. Smith, Ohio State University Time: Afternoon (1:30PM-5:30PM) Course Description: The objective of the course is to familiarize IR researchers and system developers with the design and evaluation of IR systems from a cognitive ergonomics perspective. The initial part of the course will be organized around a conceptual framework for pursuing a research project from discovery to validation. This introduction will include a discussion of alternative research approaches and the associated data collection methods. This initial section will conclude by focusing on the analysis of verbal protocols and discourse for model building and hypothesis testing. The second part of the course will discuss the adaptation and extension of the above research methods for system development purposes. First, the collection of verbal and behavioral data as part of usability studies will be considered. Then, a number of analytical techniques will be outlined for predicting the impact of design decisions on users' cognitive processes. Particular emphasis will be placed on the use of alternative representations to assist with the generation of a predictive cognitive task analysis. This second half of the course will be centered around a series of case studies. Who should attend: Introductory About the Instructors: Raya Fidel is an Associate Professor at the Graduate School of Library and Information Science, University of Washington. She teaches courses in database design, indexing and abstracting, knowledge representation, and thesaurus construction. Her research focuses on online searching behavior. Starting with her dissertation research, Ms. Fidel has been studying how users search online information systems. Ms. Fidel was among the first researchers in library and information science to employ qualitative methods in which the investigator collects data when they occur in real life. In recognition of her work, Ms. Fidel received the Best JASIS Paper Award twice, the ASIS Award for Research in Information Science in 1994, and the New Jersey ASIS Distinguished Lectureship in 1995. Ms. Fidel is active in ASIS, and ACM SIGIR for which she served as the Conference Chair for SIGIR'95. Phil Smith is a Professor in the Industrial and Systems Engineering Program at The Ohio State University. He teaches courses in the areas of cognitive systems engineering, artificial intelligence, human-computer interaction and the design of cooperative problem-solving systems, intelligent tutoring systems, and intelligent information retrieval systems. His research focuses on issues concerned with the design of cooperative problem-solving systems to aid people in performing complex tasks. He has developed and field tested a number of advanced interfaces and expert systems for information retrieval, tutoring and decision support. -------------- Title: Machine Learning for Information Retrieval Instructor: David D. Lewis, AT&T Labs - Research Time: Afternoon (1:30PM-5:30PM) Description: This tutorial will discuss machine learning methods for IR tasks, including retrieval, categorization, and routing/filtering. The emphasis will be on supervised learning (i.e. learning from manually classified examples), with some attention to unsupervised methods (e.g. clustering, LSI) for representation change. The use of machine learning in commercial IR software will be touched upon, but the emphasis will be on research findings. The tutorial will attempt to clarify the links between important but sometimes confusing concepts from IR (e.g. term weighting, query expansion, relevance feedback, classification, etc.) and important but sometimes confusing concepts from machine learning (e.g. feature extraction, overfitting, generalization, classification, etc.). Who should attend: Intermediate. About the instructor: David D. Lewis is a Principal Research Staff Member at AT&T Labs. Prior to that he was a research faculty member at the University of Chicago, and did his Ph.D. in Computer Science at the University of Massachusetts at Amherst. He has published extensively on the application of machine learning and natural language processing to IR, and has organized several workshops in these areas. He has been heavily involved in the design of the TREC evaluations and the construction of test collections for text categorization. ----------------- Title: Evaluation of IR Systems Instructors: William R. Hersh, Oregon Health Sciences University Stephen E. Robertson, City University, London Time: Afternoon (1:30PM-5:30PM) Description: The tutorial will provide an overview and critical assessment of information retrieval system evaluation. Until now the Cranfield approach to IR with recall and precision measures has dominated retrieval testing. Developments in end-user information systems such as CD-ROM's, hypertext public access systems, and the Internet are presenting new evaluation challenges. The tutorial will start with basic research concepts and their application in IR evaluation. Approaches adopted in classic retrieval experiments will be presented and their limitations will be discussed. More recent evaluative studies conducted at City University London, Oregon Health Sciences University, and TREC will be used to illustrate efforts towards more user-centered evaluation. The final discussion will consider future directions in accommodating both system and user oriented evaluation in IR. Who should attend: Introductory. Attendees should have a general knowledge of IR systems, but no specific knowledge is required. About the instructors: Dr. William Hersh is Associate Professor and Associate Director of Health Informatics at Oregon Health Sciences University in Portland, Oregon, USA. His main research interests are in the areas of automated indexing, evaluation methodologies for end-user searching, and data extraction from the electronic medical record. While his evaluation work was initially focused in the medical domain, the problems encountered have led him to confront issues of evaluation more generally. He is author of the book, "Information Retrieval: A Health Care Perspective" (Springer-Verlag, 1996). Dr. Stephen Robertson is Professor of Information Systems and Co-Director of the Centre for Interactive System Research at City University in London, UK. The Centre is concerned with the design and evaluation of advanced retrieval systems and has been responsible for the development of Okapi, a system using term weighting and based on a probabilistic model and one of the leading participants in TREC. His research interests are in the use of probabilistic models to inform IR system design, and in the development of understanding of systems and of user-system interaction through experimentation and evaluation. He has published extensively in these areas. ----------------- Title: Implementation of High Performance Information Retrieval Systems Instructors: Alistair Moffat, Department of Computer Science, The University of Melbourne, Justin Zobel, Department of Computer Science, RMIT Time: Afternoon (1:30PM-5:30PM) Description: Basic IR techniques, developed and refined over more than thirty years, are well-known. However, it is only recently that these techniques have been applied to document collections containing gigabytes of text. This tutorial examines the practical problems of indexing, querying, storing, and updating gigabyte-sized text databases. It describes a variety of recently-developed techniques for coping with the the scale of modern text collections, including fast indexing methods, fast query evaluation strategies, and fast text and index compression mechanisms. The public-domain software system MG will be used as an example, and participants will be given guidance on the installation and use of MG. The tutorial will conclude with a description of other indexing methods, in particular signature files, and an evaluation of their usefulness. Participants will be given copies of the overhead slides as well as an annotated reading list. Who should attend: Introductory-Intermediate. The tutorial is designed to cater for the needs of several audiences: -Researchers who wish to understand implementation issues for multi-gigabyte IR systems; -Practitioners who wish to understand current "best practice'' in IR implementation; and to learn about novel techniques for efficient resource usage in IR systems; and -Students with a research interest in IR system implementation. About the instructors: Dr. Alistair Moffat is an Associate Professor in the Department of Computer Science at the University of Melbourne. He completed a Ph.D. at the University of Canterbury in 1985. Since then Dr Moffat has published more than 70 refereed papers in the areas of sorting and searching algorithms; text, image, and index compression; and the implementation of information retrieval systems. He is a coauthor of the 1994 book ``Managing Gigabytes: Compressing and Indexing Documents and Images''. Dr. Justin Zobel completed his Ph.D. at the University of Melbourne in 1991. He joined the academic staff of the Department of Computer Science at RMIT in 1990, and is now a senior lecturer and a member of the RMIT Multimedia Database Systems group. Dr. Zobel has published more than 60 papers in the areas of information retrieval, database systems, text databases, and logic programming. He has been a Program Chair of the 1997 Australasian Computer Science Conference and of the 1995 Australasian Database Conference. In collaboration these two researchers developed the MG system, a research prototype text management system that incorporates a range of novel techniques to achieve compact storage of large amounts of data while still providing fast content-based access. MG has been used by the Multimedia Database Systems Group to support its activities in the TREC project over a period of more than five years. -------------------------------------------------------------- -------------------------------------------------------------- *Workshops* Research Workshops The conference will be followed by four parallel one-day workshops, as well as being preceded by a single workshop held jointly with Digital Libraries '97. Pre-Conference Workshop Education and Curriculum Development for Multimedia, Hypertext, and Information Access: Focus on DL and IR Organizer: Edward Fox, Virginia Tech When: Saturday, July 26 (9:00AM-4:00PM) This workshop is part of a series of meetings that began in 1995 to develop guidelines for curricula and courses in the broad area of "information" (Multimedia, Hypertext and Information Access). Attendees will help draft guidelines (similar to those by SIGGRAPH, SIGCHI) for curricula, courses and training programs in this area. Educators will present syllabi and describe courseware for courses or training programs about digital libraries or information retrieval. Employers will describe knowledge and skills they seek when recruiting in these areas. Researchers will explain testbeds that can be used by learners. Workshop results will be disseminated over WWW and later through ACM publications, and also will be made available through online courseware for undergraduate and graduate students. How to Apply: By June 1 send a 1 page proposal to fox@vt.edu with subject line including the phrase "DL/IR Education". State your experience and interest in the theme of the workshop. Attendees will be selected primarily on the strength of their proposal and secondarily on a first-come-first-served basis. Those submitting the strongest proposals will be asked to give short presentations in the morning session. More information is available at http://ei.cs.vt.edu/~fox/MHIA/SDL97wk.html. ----------------- Post-Conference Workshops Beyond Word Relations Organizer: Beth Hetzler, Pacific Northwest National Laboratory When: Thursday, July 31 (9:00AM - approx 3:00PM) Many IR systems identify documents or provide a document visualization based on analysis of a particular relationship among documents - that of similar content. But there may be layers of other less apparent and less traditional relationships that would potentially be useful to the user. Building a theoretical framework for this "other" information is the subject of this workshop. The focus will be on identifying non-traditional relationships which may be valuable to analysis, and on integrating among the traditional and non-traditional. The goal of the workshop is to enhance our understanding of the linkages and associations among documents by: --Identifying semantic relationships among documents. For example, some readily apparent relationships include documents with the same subject or theme, that share a property, that reference or quote one another, that share the same purpose, or that embody a cause-and-effect relationship. --Categorizing those relationships --Identifying attributes of the relationships --Identifying areas for follow-on research, such as visualization possibilities The workshop will be structured in two pieces. The morning will include short presentations by several of the invited attendees to review relevant work and to provoke discussion. The afternoon will include a break-out session with small groups each focused on particular topics. Specific topics will be influenced by the submitted white papers. Candidates include: --Possible semantic relationships --Visualizations of relationships --Attributes of relationships --Identifying applications After the break-out sessions, each group will present a summary of results to the workshop as a whole. Follow on discussion will be used to refine the results. A summary of the results will be provided for publication in SIGIR Forum. Participants should send a short (2 page) white paper or an extended abstract discussing your ideas for this forum to Beth Hetzler at eg_hetzler@pnl.gov by June 1, 1997. ----------------- Networked Information Retrieval Organizers: Jamie Callan, University of Massachusetts, Amherst Chris Buckley, Cornell University and SABIR Research Norbert Fuhr, Universitaet Dortmund When: Thursday, July 31 (9:00AM - approx 3:00PM) The recent and rapid growth of the Internet and corporate intranets poses new problems for Information Retrieval. There is now a need for tools that help people navigate the network, select which collections to search, and fuse the results returned from searching multiple collections. These problems are being addressed by the international IR research community and a number of digital libraries projects around the world, such as the U.S. Digital Libraries projects, the ERCIM Digital Libraries projects and the German MEDOC project. The goal of this workshop is to bring together people from each of these areas to discuss their varying approaches to common problems. Researchers are invited to submit position papers or extended abstracts discussing novel approaches to the following problems: -Resource selection: selecting from among a set of collections or databases; -Data fusion: merging or fusing results from different collections or databases; -Browsing, summarization and visualization of distributed resources; -Archival retrieval methods for heterogeneous objects; -Metaknowledge; -Consistency; -Multilingual environments; -User interfaces; and -Architectures for networked information retrieval The workshop will consist of a series of presentations, followed by a discussion session. The presentations will be based on submitted papers or extended abstracts. The electronic proceedings from last year's workshop, organized by the same committee, is available on the web. Please submit a short paper or an extended abstract by June 1, 1997. Send it via e-mail to callan@cs.umass.edu or send three copies to: Jamie Callan; Computer Science Dept.; Univ. of Massachusetts; 740 North Pleasant Street; LGRC, Room A243; Amherst, MA 01003-4610, USA (ph. +1 413-545-4878; fax +1 413-545-1249). ----------------- Summarization and Visualization for IR: Reducing the Information Overload Organizers: James Allan, University of Massachusetts, Amherst Amit Singhal, AT&T Labs - Research When: Thursday, July 31 (9:00AM - approx 3:00PM) How can IR techniques be used to reduce the cliched "information overload" problem? Researchers have been using IR's statistical methods to provide "best sentence" summaries of documents for decades, but other types of summaries are needed to assimilate the piles of information available on world-wide networks. For example: -Grouping of retrieved texts into related classes. -Lists of main topics in a collection or in a retrieved set. -Summaries of non-textual material: is a thumbnail the only approach? -Displays of how retrieved material related to other material (the collection, earlier results, etc.) -Query- or user-specific summaries. -Improving coherence and coverage of summaries. -Visuals that help a user understand why documents were retrieved or whether it likely that the desired document is anywhere to be found. - Evaluation of summarization and visualization techniques Those issues and any others that are related will be the topics of this workshop. It will be organized into a series of 3 or 4 sessions covering topics of interest to the participants. Each session will include a few short presentations followed by discussion. Those planning to attend should send a 1- or 2-page position paper or an extended abstract via e-mail to allan@cs.umass.edu by June 1, 1997. Indicate with your submission whether you are interested in presenting. Some of the submissions will be selected for presentation. All other papers will be used to judge the interest of the participants to decide on the best organization of sessions. ----------------- Crosslingual Information Retrieval Organizers: Jaime Carbonell, Carnegie Mellon University Yiming Yang, Carnegie Mellon University When: Thursday, July 31 (9:00AM - approx 3:00PM) Crosslingual Information Retrieval (aka "translingual" or "multilingual" IR) is a rapidly growing area of IR, driven in part by the ease of information access across national and linguistic boundaries afforded by the internet and the web. The 1996 crosslingual (CIR) SIGIR workshop helped establish this new field, and there has been considerable progress since then in the context of TREC and in a number of new CIR techniques and comparative evaluations. This workshop offers a forum for discussion of developments and emerging issues in CIR. In particular, we expect to address: - New methods for CIR (beyond dictionary-based query translation) - The role of query expansion in CIR - The role of bilingual corpora in CIR - Can MT help in CIR, and if so how? - How should CIR performance be evaluated? - Can we set some common benchmarks and/or corpora? - What message(s) should we carry to TREC wrt CIR? - What are the greatest challenges for CIR? The final format of the workshop will be determined by consensus of the participants, but we envision starting with reports on recent progress (TREC, new results, etc.), moving to discussion of the issues above, and wrapping up with a synthesis on the state of the art in CIR. We would also like to have one or more systems available for experimentation/demonstration, logistics permitting. If interested in participating, please send a 1-2 page position paper, mentioning work and interest(s) in this area. If you also wish to present new results, or brief highlights of earlier CIR activities (e.g. TREC, the AAAI workshop, etc.) please indicate this clearly and we will try to accommodate. Also, please indicate if you have experience in and consequently desire to lead the discussion of one of the issues above. If you wish to demonstrate a system please let us know. (Internet access will, unfortunately, not be available.) Submissions in ascii to jgc@cs.cmu.edu with the word "crosslingual" in the subject field by June 1, 1997.