IRLIST Digest December 18, 1990 Volume VII, Number 37 Issue 43 ********************************************************** I. NOTICES A. Meetings announcements/Calls for papers 1. Reversible Grammar in Natural Language Processing University of California, Berkeley, California June 17, 1991 B. Publications announcements 1. LINGUIST@UNIWA.UWA.OZ.AU 2. COMP.MULTIMEDIA IV. PROJECT WORK B. Bibliographies 1. Selected IR-related dissertation abstracts ********************************************************** I. NOTICES I.A.1. Fr: Don Walker Re: Reversible Grammar in Natural Language Processing University of California, Berkeley, California June 17, 1991 A workshop sponsored by the Special Interest Groups on Generation (SIGGEN) and Parsing (SIGPARSE) of the Association for Computational Linguistics and supported by the Defense Advanced Research Projects Agency TOPICS OF INTEREST: The purpose of this workshop is to bring together researchers whose work concerns problems of reversible grammar systems that are designed for, or may find applications in, Natural Language Processing. Papers are invited on significant, original and unpublished research on all aspects of reversible grammars, including, but not limited to: (1) Reversible computation (multi-directional and non-directional computation; algorithms for program inversion and transformation; efficiency issues); (2) Reversible natural language systems (parsers and generators for reversible grammars; reversibility of unification-based grammars; new architectures for reversible natural language processing; knowledge representation issues; reversible machine translation; lexicons for bidirectional systems; reversibility in discourse processing); (3) Reversible grammars in linguistic theory (formal characterization; reversibility within various grammatical frameworks, eg., GB, LFG, GPSG, HPSG, TAG, categorial grammars; reversibility in rule-based and principle-based approaches; reversibility and semantic compositionality). FORMAT OF SUBMISSION: Authors should submit four copies of their papers in hard copy form. Papers should be a minimum of four pages and a maximum of ten single-spaced pages (exclusive of references). The title page should include the title, full names of all authors and their complete addresses including electronic addresses where applicable, and a short (5 line) summary. Submissions that do not conform to this format will not be reviewed. Send submissions to: Tomek Strzalkowski Courant Institute of Mathematical Sciences New York University 715 Broadway, Room 704 New York, NY 10003, USA tomek@cs.nyu.edu (+1-212) 998-3496 SCHEDULE: Papers must be received by 1 March 1991 (NOT 31 March, as in a previous release). Authors will be notified of acceptance by 5 April 1991. A camera-ready copy of the final paper prepared in the two-column format must be received by 10 May 1991. Accepted papers will be included in the proceedings published by the ACL. WORKSHOP INFORMATION: The workshop is held in connection with the 29th Meeting of the ACL (18-21 June). Local arrangements are being handled by Peter Norvig (Division of Computer Science, University of California, 573 Evans Hall, Berkeley, CA 94720, USA, (+1-415) 642-9533, norvig@teak.berkeley.edu). ORGANIZING COMMITTEE: Marc Dymetman, Gertjan van Noord, Patrick Saint-Dizier, Tomek Strzalkowski. ********** I.B.1. Fr: Nancy M. Ide Re: LINGUIST@UNIWA.UWA.OZ.AU ANNOUNCING A NEW LIST LINGUIST@UNIWA.UWA.OZ.AU A new list has been formed, which will serve as a place of discussion for those issues which concern the academic discipline of linguistics and related fields. The list is international in orientation, and hopes to provide a forum for the community of linguists as they exist in different countries. Though the list is moderated, and all submissions are subject to editorial discretion, it has no areal, ideological or theoretical bent, and discussion of any linguistic subfield are welcomed. Membership of the list is open to all. To subscribe to this list, please send a message to LINGUIST-REQUEST@UNIWA.UWA.OZ.AU containing as its first and only line the following: SUBSCRIBE LINGUIST Any other questions may be directed to: LINGUIST-EDITORS@UNIWA.UWA.OZ.AU ********** I.B.2. Fr: Ittai Hershman Re: COMP.MULTIMEDIA Proposal: The creation of a newsgroup called "comp.multimedia" to discuss interactive multimedia technologies and applications: systems which combine text, graphic images, animation, sound, and/or video, with computer technology. The newsgroup would promote discussion of relevant technologies (e.g. DVI, CD-I, CD ROM, videodisc) applications and methodologies. Rationale: At present there is no newsgroup which addresses "multimedia" technology and applications per se. There are three groups which discuss specialized aspects: comp.mail.multi-media, comp.ivideodisc and comp.cog-eng. None, however, is regarded as a focal point for "multimedia" and all suffer from crossposts as a result. "Multimedia" is a group of technologies with a lot of momentum, in the same way that "AI" and "groupware" have been before. As these technologies and their applications mature, traffic may dictate the formation of subgroups (like: comp.multimedia.interactive, comp.multimedia.desktop, comp.multimedia.dvi, etc.). At the moment, however, one forum with one unified name would be best in my opinion. Naming: Ten days ago I floated the idea for "comp.multimedia" on the comp.ivideodisc newsgroup. The reaction was positive, but a number of people voiced dissatisfaction with the term "multimedia". One person typified the naming problem when he stated: "The lexical token `multimedia' is pretty much debased." Alternatives such as "polymedia" and "new-media" were suggested. My response to these concerns is that whether we like it or not the term "multimedia" has stuck to this stuff. I suggest its use in the title simply because it is the term which most people associate with the topic and is therefore most likely to attract those who are interested. I hope we do not spend the next month -- the alotted time for discussion before a vote, arguing about neologisms. Unless this gets bogged down in hopeless disagreement (which I hope does not occur), I will continue the process with a "call for vote" on December 15th. I ask all interested parties to voice an opinion on news.groups in response to this message (preferably positive :-) and to vote accordingly (via e-mail) in response to the "call for vote" on December 15th. Ittai Hershman Associate Director of Academic Computing NYU Stern School of Business ********************************************************** IV. PROJECTS IV.B.1 Fr: Susanne M. Humphrey Re: Selected Ir-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADGDX-90151. AU CONNOLLY, MICHELLE M. TI HIGH-LEVEL LANGUAGE CONSTRUCTS FOR RELATIONAL DATABASE DESIGN. IN Queen's University of Belfast (Northern Ireland) Ph.D. 1986, 250 pages. DE Computer Science. AB Available from UMI in association with The British Library. This research project investigates the use of high level language constructs for the development of relational database systems. The use of a high level data structure, such as the Pascal record for representing relation types was considered. This lead to the development of the simple database language RAL (Relational Algebraic Language). RAL provides its user with a data definition facility, plus a set of update and retrieval operations. However, RAL has considerable limitations as it does not provide any branching, iteration or data abstraction facilities. In recent years several sophisticated database programming languages, including Pascal/R, Modula/R, PLAIN, Theseus and RIGEL have been developed which incorporate facilities for relational database definition and manipulation within the framework of a standard well-structured language. An important aspect of these "database programming languages" is that full advantage may be taken of existing language features for type definition, branching, iteration etc. The core of this research involves the design and development of the database programming language RAPP (Relational Algebraic Pascal-Plus). RAPP extends the modular multi-processing language Pascal-Plus with the built-in type relation, together with an appropriate set of relational algebraic operations. The research investigates the extent to which the type definition facilities of this extended language meet the complex data modelling requirements of modern database systems. AN University Microfilms Order Number ADGDX-89823. AU DALY, ROBERT. TI A RELATIONAL DATABASE FOR CARTOGRAPHIC MAP DISPLAYS. IN University of Sussex (United Kingdom) Ph.D. 1988, 294 pages. DE Computer Science. Information Science. AB Available from UMI in association with The British Library. The research presented in this thesis is concerned with the storage of cartographic data on computer database systems. In particular, it describes the design and implementation of a relational database system primarily intended to support the retrieval of map data with the emphasis being placed on high accuracy, speed and flexibility. Data is expected in a vectorized form and output is intended for colour raster displays. The thesis also examines the models which may be used to represent map data and presents some interesting scale-free line and surface structures. Emphasis is placed throughout on providing generalized and practical solutions which are relevant to large databases with extensive coverage. The section on data modelling discusses the contrasting requirements of line-based and area-based problems and the relative merits of regular and irregular representations. The model adopted as a general-purpose database incorporates an irregular vector structure, a scale-free, quadtree-based surface structure and an hierarchical feature description scheme. A grid partitioning is also considered necessary, partly for indexing reasons and partly for supporting enclosure and colour infill operations. Within the database system, called RDBASE, emphasis is placed on powerful indexing structures, high portability, integrated metadata, providing a bulk data storage facility for coordinate strings, and integrating SQL queries with a functional interface. Various indexing methods are described and their suitability for use on map data is discussed. Extensions to the B$\sp+$tree method are presented which enable the handling of repeated values and combinations of attributes, and which also support query optimization costing functions and unique identifier generation. Problems involving relational representations of network structures are used to identify the need for a functional interface to the database. The interface developed incorporates two levels: an efficient low level and a SQL-based high level. The latter incorporates an interesting implementation of the decomposition method for its query optimization. AN University Microfilms Order Number ADG90-23899. AU MACKELLAR, BONNIE KATHLEEN. TI RETRIEVAL BY SIMILARITY IN A KNOWLEDGE BASE OF REUSABLE CODE. IN The University of Connecticut Ph.D. 1989, 238 pages. DE Computer Science. AB In this thesis, WharfRat, a case-based reasoning system, is presented. WharfRat is a general knowledge representation system in which similarity-based retrieval is the primary retrieval mechanism. The system employs a general, graph-based data model in which object types are organized in a specialization network. Domain-specific data representations are built using the constructs of the general data model. The WharfRat application presented in this talk contains a knowledge base of data type implementations. Given a description of an abstract data type, it retrieves the most similar data type implementation in the knowledge base. Similarity between data types is modeled by a fuzzy relation. A set of similarity matching rules has been developed and implemented. Research issues addressed by this project include the development of a knowledge representation that is both case-oriented and general enough to be useful in various domains, the development of procedures that permit reasoning by similarity, and the application of case-based reasoning to the problem of reusable data types. AN University Microfilms Order Number ADGDX-89646. AU SHAH, HANIFA UNISA. TI THE IMPLEMENTATION OF A FUNCTIONAL QUERY LANGUAGE FRONT-END TO A RELATIONAL DATABASE SYSTEM. (VOLUMES I AND II). IN University of Aston in Birmingham (United Kingdom) Ph.D. 1989 501 pages. DE Computer Science. AB Available from UMI in association with The British Library. Database systems have a user interface one of the components of which will normally be a query language which is based on a particular data model. Typically data models provide primitives to define, manipulate and query databases. Often these primitives are designed to form self-contained query languages. This thesis describes a prototype implementation of a system which allows users to specify queries against the database in a query language whose primitives are not those provided by the actual model on which the database system is based, but those provided by a different data model. The implementation chosen is the Functional Query Language Front End (FQLFE). This uses the Daplex functional data model and query language. Using FQLFE, users can specify the underlying database (based on the relational model) in terms of Daplex. Queries against this specified view can then be made in Daplex. FQLFE transforms these queries into the query language (Quel) of the underlying target database system (Ingres). The automation of part of the Daplex function definition phase is also described and its implementation discussed. AN University Microfilms Order Number ADG90-21362. AU SHENOY, SREEKUMAR THRIVIKRAMA. TI SEMANTIC QUERY PROCESSING IN DATABASE SYSTEMS. IN Case Western Reserve University Ph.D. 1990, 131 pages. DE Computer Science. AB In this thesis we describe a scheme to represent, utilize, and maintain semantic knowledge in optimizing user specified queries. The semantics is represented as function-free clauses in predicate logic. Knowledge is formulated as semantic constraints pertaining to the database, and is assumed to be satisfied by all the instances of the database. We use three distinct types of semantic constraints, namely, implication constraints, subset constraints, and aggregate constraints. The proposed scheme uses a graph theoretic approach to identify redundant joins and restrictions present in a given query. An optimization algorithm is presented which eliminates semantically redundant non-profitable specifications from a query while adding redundant profitable specifications to it. The optimization algorithm is guided by a set of heuristics and utilizes dynamic interaction of three entities--schema, semantics, and query--for semantic query transformation. Various sequential stages of the algorithm are described that deal with semantic expansion, relation elimination, restriction elimination etc. The complexity of the algorithm and cost reduction of semantic optimization are analyzed in detail. The implementation architecture of the algorithm and test results on a representative set of data are presented. Details on all the associated user interfaces are described. The test results reveal the potential advantages of a semantic optimizer in conjunction with a conventional one. Many potential inconsistencies possible with a growing set of semantic constraints are identified. Maintenance issues associated with different types of constraints are addressed. Algorithms for semantic maintenance regarding redundancy and contradiction are introduced for subset, implication, and aggregate constraints. AN University Microfilms Order Number ADG90-27047. AU SMITH, MARIA ELENA. TI ASPECTS OF THE P-NORM MODEL OF INFORMATION RETRIEVAL: SYNTACTIC QUERY GENERATION, EFFICIENCY, AND THEORETICAL PROPERTIES. IN Cornell University Ph.D. 1990, 161 pages. DE Computer Science. AB A practical information retrieval system must be easy to use by untrained users, and it must provide prompt responses to a user's search requests. In this thesis, these practical aspects of the p-norm model of information retrieval are explored. In addition, a study of theoretical properties of the p-norm model is presented. A syntactic method for generating p-norm queries from parse trees generated by the PLNLP syntactic analyzer is presented. The effectiveness of the syntactically generated queries is shown to be comparable to the effectiveness of manually constructed queries, and much better than that of statistically generated queries. The efficiency of a p-norm retrieval is significantly improved with a new p-norm retrieval algorithm which evaluates the entire document collection in one recursive traversal of the query tree. This algorithm is compared against the straightforward algorithm, which requires a traversal of the query tree for each document that is evaluated. The new algorithm is shown to be better both asymptotically and experimentally. The infinity-one model is introduced as a means of approximating the p-norm model without requiring exponentiation. Experimental results show that infinity-one retrieval is essentially as effective as p-norm retrieval, but much faster. List pruning methods for further efficiency improvements are also introduced and are shown to reduce retrieval time significantly without affecting the precision of top-ranked documents. The retrieval time of the infinity-one model with list pruning is shown to be comparable to that of pure Boolean retrieval. A theoretical study is also presented in which certain Boolean algebra properties, such as associativity, are shown to be unsatisfiable by any extended Boolean system with weak operators. The p-norm model is shown to satisfy all those properties that can be satisfied. In addition, the p-norm model is evaluated with respect to the Waller-Kraft wish list for extended Boolean systems. AN University Microfilms Order Number ADG90-27323. AU WANG, GUANG-NAY. TI A MICROCOMPUTER-BASED INFORMATION RETRIEVAL SYSTEM WITH RELATIONAL THESAURI TO SUPPORT A MEDICAL EXPERT SYSTEM. IN Illinois Institute of Technology Ph.D. 1989, 124 pages. DE Computer Science. Information Science. Health Sciences, Medicine and Surgery. AB LITREF is an online bibliographic retrieval system. It supports the Stroke Consultant MAIESTRO, an expert system for the diagnosis and management of stroke cases. This expert system is a joint project of the Department of Neurology at Michael Reese Hospital and the Department of Computer Science at Illinois Institute of Technology. LITREF interprets a Boolean command language to provide fast, efficient browsing of abstracts from the stroke literature. User feedback on keywords derived from a relational thesaurus is used to enhance the original query. Response time is 'immediate' when LITREF runs on an IBM PC/AT microcomputer. The user-friendly online help facilitates users access to the system. It has been enthusiastically received by users at its initial trial. The system design uses an inverted file structure with a bitmap strategy to speed access to retrieval objects. The implementation uses MS-QUICK C under DOS 3.1. It runs on any IBM compatible PC with a hard disk. The database contains 597 abstracts in the area of Cerebral Infarction and Cerebral Hemorrhage from 1983 to 1988 and occupies 1.6 Megabytes in storage. Three index files take 42 Kbytes in memory. The cosine function is used to measure the similarity between a query and an abstract. Recall and precision are used in the evaluation of retrieval effectiveness. Two experiments have been performed using LITREF as a testbed. One experiment involves applying alternative suffixing algorithms to both index terms and query keywords. When the results are evaluated by comparing precision values at each recall level we found that the word stem index method is superior to the full work index method. The other experiment is a test of whether a relational thesaurus extracted from a large lexicon with lexical and semantic relations from Webster's Seventh New Collegiate Dictionary can actually improve retrieval effectiveness. The initial test using binary weighting in a batch run obtained a negative result. ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@postgres.berkeley.edu calur@uccmvsa.bitnet Mary Engle engle@cmsa.berkeley.edu meeur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. These files are not to be sold or used for commercial purposes. Contact Mary Engle or Nancy Gusack for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.