IRLIST Digest August 24, 1992 Volume IX, Number 30 Issue 126 ********************************************************** I. NOTICES A. Meeting Announcements/Calls for Papers 1. New OED Conference B. Publications Announcements 1. Language Industry Monitor followup II. QUERIES B. Requests for Information 1. Looking for FTP Sites 2. Request for Information on Multi-User Online Bibliographies 3. PD/Freely Available Text Retrieval Software IV. PROJECT WORK C. Abstracts 1. IR-Related Dissertation Abstracts ********************************************************** I. NOTICES I.A.1. Fr: New OED general account Re: New OED Conference 8th Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research Screening Words: User Interfaces for Text October 18-20, 1992 University of Waterloo Waterloo, Ontario, Canada The conference theme represents a universal problem of immediate concern: we have all repeatedly struggled with accessing or main- taining materials stored in large text repositories. Although information retrieval has been well-studied in a library context, and database querying has matured for conventional business applications, much research and development is still required to suit most text and information needs. The Eighth Annual "OED Conference" will extend our ongoing exploration of text management applications and techniques. This year, we will investigate diverse application activities: retrieving information for scientific disciplines; supporting literary and linguistic needs; bridging between corpora, dic- tionaries, and other knowledge representations; managing multi- lingual and translated documents; and managing text in commercial environments. It is expected that attendees will be exposed to problems motivated by specific application areas and to solutions that can address requirements across several areas. This year's conference is sponsored by: Information Technology Research Centre and Open Text Corporation with ongoing research support of the University of Waterloo and the Natural Sciences and Engineering Research Council of Canada. CONFERENCE FEES: Registration covers all conference activities, one copy of the conference proceedings, a reception, two lunches, refreshment breaks, and dinner on Monday evening. Total fees include GST (Canadian Goods and Services Tax) GST Registration # R119260685 (Before October 1): Academic: $180.00 + GST = $192.60 (Cdn) Non-academic: $280.00 + GST = $299.60 (Cdn) Student: $ 74.77 + GST = $ 80.00 (Cdn) (After October 1): Academic: $205.00 + GST = $219.35 (Cdn) Non-academic: $305.00 + GST = $326.35 (Cdn) Student: $ 93.46 + GST = $100.00 (Cdn) Additional proceedings: $20 + GST = $21.40 (Cdn) each Please make cheques payable to the UNIVERSITY OF WATERLOO, and send to: Liz Bevan Centre for the New OED and Text Research Davis Centre, Room 1311 University of Waterloo Waterloo, Ontario, Canada N2L 3G1 Registration Form: Complete the following form and return it with your registration fee, or send the completed electronic form to: newoed@watsol.uwaterloo.ca with the payment to follow by mail. The conference information package will be forwarded upon receipt of registration. Surname: First Name(s) (to appear on your name tag): Address: Business Phone: Fax #: Affiliation/Business (short form for name tag): Full e-mail address: ********** I.B.1. Fr: Colin Brace Re: Language Industry Monitor (follow up) LANGUAGE INDUSTRY MONITOR [Follow-up] In February of this year, I posted a notice on this net about LANGUAGE INDUSTRY MONITOR, a bimonthly newsletter dedicated to the world of natural language computing. The reponse was overwhelming, with many new subscriptions and orders for sets of back issues. Inquiries from this posting are still trickling in. If any of you did respond but have not yet heard from me or have not received a sample issue, my sincere apologies. Please contact me again, restating your request. For those who did not see the original announcement, very briefly, LANGUAGE INDUSTRY MONITOR offers a lively roundup of news, background information, and commentary the world of natural language computing. This includes such technologies as speech processing, handwriting recognition, terminology management, full text indexing and retrieval, document processing, and computer-aided translation (including MT). LANGUAGE INDUSTRY MONITOR is currently the only publication of its kind. It is unique because it addresses these related technologies as a whole, placing them in relation to each other, and viewing them in the context of broader technological, social, and political issues. To clarify a couple of points which weren't clear from the original notice and caused some confusion: * LANGUAGE INDUSTRY MONITOR is not (yet) distributed in digital form; it's printed on paper. * LANGUAGE INDUSTRY MONITOR does cost money (US$ 95 airmail). Published independently and advertisement-free, it's "subscriber-driven." If you would like to receive a sample copy of the next issue of LANGUAGE INDUSTRY MONITOR, send me your name, department, company, or institute, and full mailing address. > > > > > > IMPORTANT < < < < < < If you are involved in a natural language processing project and have a product announcement or interesting research results to share, or, for example, you have evaluated or extensively used any of the NLP-based products currently available, I would very much like to hear from you. Send announcements, reports, publications, voluntary submissions, or other relevant materials to my attention at the following address. Or contact me directly to make an appointment for an interview. Colin Brace Editor L A N G U A G E I N D U S T R Y M O N I T O R "The World of Natural Language Computing" ISSN 0925-3327 Eerste Helmersstraat 183 1054 DT Amsterdam, The Netherlands Tel: + 31 20 685-0462 Fax: +31 20 685-4300 Internet: colinb@paramount.nikhefk.nikhef.nl CompuServe: 70023,1164 ********************************************************** II. QUERIES II.B.1. Fr: Ennio Lorenzi Re: Looking for Ftp Sites I would appreciate the help of anyone who could supply info on ftp sites with particular reference to stemmers .. Ciao da, ENNIO | e-mail: s-mail: | s925891@numbat.cs.rmit.OZ.AU Ennio Lorenzi | s925891@yallara.cs.rmit.OZ.AU 80 The Esplanade, Clifton Hill | elorenzi@arcadia.mc.phillip.edu.au Victoria 3068 Australia ********** II.B.2. Fr: Gordon Joly Re: Request for Info on Multi-user On-line Bibliographies. I'm looking for software to create on-line bibliographies suitable for multiple users to access and contribute to. I know that there are database programs which could be used to create bibliographies, but I'd like to find something that has already been created specifically for that purpose and that will support multiple users. I'd also be interested in experiences people may have had with such software. Software for a Macintosh LAN environment is preferred, but something that would work on a UNIX network would also be of interest. Please reply via email to sasha@uswest.com Gordon Joly Computer Science University College London Gower St. London WC1E 68T phone: +44 71 387 7050 x3703 fax: +44 71 387 1397 Internet: g.joly@cs.ucl.ac.uk UUCP: !{uunet,uknet}!ucl-cs!g.joly ********** II.B.3. Fr: N. Gusack, IR-L Moderator Re: PD/Freely Available Text Retrieval Software I erased the submission! Whoever sent the request for information, please send it again. Thanks. ********************************************************** IV. PROJECT WORK IV.C.1. Fr: Susanne M. Humphrey Re: Selected IR-Related Dissertation Abstracts The following are citations selected by title and abstract as being related to Information Retrieval (IR), resulting from a computer search, using BRS Information Technologies, of the Dissertation Abstracts Online database produced by University Microfilms International (UMI). Included are UMI order number, title, author, degree, year, institution; number of pages, one or more Dissertation Abstracts International (DAI) subject descriptors chosen by the author, and abstract. Unless otherwise specified, paper or microform copies of dissertations may be ordered from University Microfilms International, Dissertation Copies, Post Office Box 1764, Ann Arbor, MI 48106; telephone for U.S. (except Michigan, Hawaii, Alaska): 1-800-521-3042, for Canada: 1-800-268-6090. Price lists and other ordering and shipping information are in the introduction to the published DAI. An alternate source for copies is sometimes provided. Dissertation titles and abstracts contained here are published with permission of University Microfilms International, publishers of Dissertation Abstracts International (copyright by University Microfilms International), and may not be reproduced without their prior permission. AN University Microfilms Order Number ADG91-33185. AU WRIGHT, NANCY DANIEL. TI A CITATION CONTEXT ANALYSIS OF RETRACTED SCIENTIFIC ARTICLES. IN University of Maryland College Park Ph.D. 1991, 493 pages. SO DAI V52(06), SecA, pp1930. DE Library Science. Information Science. AB When scientific articles are found to contain fabricated or plagiarized data, or to be invalidated by pervasive error, they are customarily withdrawn from the literature by published retractions. The effect of retraction was studied by citation analysis of 53 retracted biomedical articles. Two research questions were investigated: Does publication of a retraction depress the frequency of subsequent citation. and Are post-retraction citations negative, or are retracted papers cited in order to support or substantiate the assumptions or findings of the citing article. The first question was studied by comparing the number of citations that each article received during the two years before retraction to the number of citations received during the two years following retraction. A statistically significant difference was found between the number of pre-retraction citations (495) and the number of post-retraction citations (303). The difference remained statistically significant even after estimating the decline in citation frequency attributable to obsolescence. The second question was studied by citation context analysis of the 303 residual articles that had cited the retracted publications. The portions of the citing articles that referenced the retracted articles were classified by a subject specialist according to the typology of citation contexts developed by Chubin and Moitra. Citation contexts were found to be affirmative to a statistically significant extent. With ninety percent of the citations being affirmative, the retracted articles were cited as though they were still valid. The citing articles, whose authors were evidently unaware of the published retractions, were written in standard scientific languages; generally originated in the United States, Western Europe, or Scandinavia; and were published in relatively prestigious journals. A content analysis of retraction statements indicated that only 68 percent specified the reason for retraction, 45 percent were published prominently in the journal, and 40 percent identified the individual responsible for the error or misconduct. The continued, affirmative use of retracted articles suggests that some retractions are going unheeded, and that invalid information may be perpetuated through citation in subsequent articles. AN This item is not available from University Microfilms International ADG05-70439. AU BRODSKY, LLOYD. TI A KNOWLEDGE-BASED PREPROCESSOR FOR APPROXIMATE JOINS IN IMPROPERLY DESIGNED TRANSACTION DATABASES. IN Massachusetts Institute of Technology Ph.D. 1990. SO DAI V52(07), SecA, pp2621. DE Business Administration, Management. AB Retrieving data from multiple databases requires keys; identically defined attributes which enable connecting the databases. Databases developed independently of each other do not necessarily contain the needed keys. The contents of these multiple databases frequently reflect different aspects of a common entity. For example, the contents of a hospital bill database and a physician bill database both are related to particular episodes of illness of particular patients. In the absence of a key, knowledge of the implicit connection may be used to form an approximate compound key from existing attributes. For example, a surgical operation must be done during a corresponding hospital stay. A series of such comparisons taken together approximate the effect of an exact key. I propose that such approximate keys can be constructed by finding pairs of comparable attributes and comparing them. Such pairs can be found by dividing the existing databases into sets of application-defined universal attribute types and taking a cross-product of the database dictionary information. Pairs of attributes of the same attribute type but in different databases can potentially be compared. I define tests of consistency when the attributes to be compared are not directly comparable. The relevant comparisons are defined by end-users defining the attribute types which are relevant to identifying the missing entity. To test this theory, I built software which solicits data analysts for universal attribute type membership for attributes and which solicits end users for relevant attribute type. The system then: (1) produces comparison pairs by cross-tabbing the data dictionary (done once per database pair); (2) selects relevant comparison pairs (done once per query class); (3) infers how to compare the attributes in each of the comparison pairs selected and instantiates an implementing SQL clause (done once per relevant comparison pair); (4) assembles the subtemplates into a template reusable for a class of similar queries and updates a menu of available query classes; (5) transforms a query class template into an executable SQL query by soliciting needed attribute values. Three test queries are generated using the software I wrote and the results are analyzed. (Copies available from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690). (Abstract shortened with permission of school.). AN University Microfilms Order Number ADG91-36131. AU AL-KHARASHI, IBRAHIM A. TI MICRO-AIRS: A MICROCOMPUTER-BASED ARABIC INFORMATION RETRIEVAL SYSTEM COMPARING WORDS, STEMS, AND ROOTS AS INDEX TERMS. IN Illinois Institute of Technology Ph.D. 1991, 90 pages. SO DAI V52(07), SecB, pp3703. DE Computer Science. Information Science. AB Experimentation with retrieval systems in Arabic language environments has been very limited. Arabization of available information retrieval systems has dealt mostly with internal representation of the Arabic data and translation of menus and system messages to Arabic. The problems of working with the Arabic language have not been confronted directly. Stemming algorithms have been widely used to enhance the retrieval behavior of information retrieval systems. In English based systems, stemming algorithms deal with the removal of suffixes to reduce the storage needed for the keyword list and to increase the recall factor by conflating word variants. In the Arabic language, both prefixes and suffixes are added to roots and stems to form related words. The number of affixes used in the Arabic language exceeds that used in English. Surface affix removal processes produce word stems while deep affix removal processes produce word root. This research studies the effect of using words, stems, and roots of Arabic words as index terms on the effectiveness of the retrieval of Arabic bibliographic records. To run the experiment for these three different retrieval methods we used 355 Arabic bibliographic records covering computer and information science, and 29 queries. The test was conducted on an IBM/AT compatible microcomputer using the Microcomputer-based Arabic Information Retrieval System, Micro-AIRS. The effectiveness of the system using word, stem, and root retrieval methods are presented using the recall and precision measures along with two nonparametric statistical tests. The system evaluation results shows the superiority of the root retrieval method over the word retrieval method, and over the stem retrieval method at high recall levels. It also shows the superiority of stem retrieval method over the word retrieval method at all recall levels. The ********************************************************** IRLIST Digest is distributed from the University of California, Division of Library Automation, 300 Lakeside Drive, Oakland, CA. 94612-3550. Send subscription requests to: LISTSERV@UCCVMA.BITNET Send submissions to IRLIST to: IR-L@UCCVMA.BITNET Editorial Staff: Clifford Lynch lynch@uccmvsa.ucop.edu or calur@uccmvsa.bitnet Nancy Gusack ncgur@uccmvsa.bitnet Mary Engle meeur@uccmvsa.bitnet The IRLIST Archives will be set up for anonymous FTP, and the address will be announced in future issues. To access back issues presently, send the message INDEX IR-L to LISTSERV@UCCVMA.BITNET. To get a specific issue listed in the Index, send the message GET IR-L LOGYYMM, where YY is the year and MM is the numeric month in which the issue was mailed, to LISTSERV@UCCVMA (Bitnet) or LISTSERV@UCCVMA.UCOP.EDU. These files are not to be sold or used for commercial purposes. Contact Nancy Gusack or Mary Engle for more information on IRLIST. The opinions expressed in IRLIST do not represent those of the editors or the University of California. Authors assume full responsibility for the contents of their submissions to IRLIST.