********************************************************************** SIG-IRList Digest (ISSN 1064-6965) June, 2003 ********************************************************************** I. NOTICES A. MEETINGS 1. Young researchers scholarships for attending SEPLN '03 - The 19th conference of the Spanish Society For Natural Language Processing. 2. KCAP-03 Workshop - Capturing Knowledge From Domain Experts: Progress And Prospects B. SOFTWARE 1. Canoo's Morphology Software for Italian II. JOBS 1. Teragram - Software Developer and Software Engineer 2. CMU - Postdoctoral Research ********************************************************************** I. MEETINGS I.A.1. From: OESI Informa YOUNG RESEARCHERS SCHOLARSHIPS FOR ATTENDING THE 19TH CONFERENCE OF THE SPANISH SOCIETY FOR NATURAL LANGUAGE PROCESSING (SEPLN'03) The Spanish Society for Natural Language Processing, in order to spread research conducted in the field of Natural Language Processing at pre-doctoral levels, will be offering 5 scholarships for attending the 19th SEPLN Conference (http://oesi.cervantes.es/sepln). REQUIREMENTS Pre-PhD young researchers who meet the following requirements: - they are developing their PhD thesis - they have not been granted another scholarship for the same purpose - they are SEPLN members[1] COVERED COSTS Scholarships will cover the following costs: - transportation to the conference venue - accommodation, in a nearby residence to the conference venue - registration The maximum amount due is 400 EURO. APPLICATION Applicants must send their CVs to the following e-mail address: ayudasXIXsepln@dlsi.ua.es, accompanying a certificate, signed by their thesis director in which he states the researcher is currently developing a PhD thesis, together with a statement in which the applicant states not to have been granted any another scholarship for the purpose of attending this conference. The deadline for submitting applications is June 30, 2003. [1] If the applicant is not a SEPLN member, registration to become a SEPLN member is a must ********************************************************************** I.A.2. From: D Sleeman KCAP-03 WORKSHOP CAPTURING KNOWLEDGE FROM DOMAIN EXPERTS: PROGRESS AND PROSPECTS To be linked with the Workshop on "Distributed & Collaborative Knowledge Capture" (Please also see their announcement) A brief technical description of the workshop, specifying the workshop goals and the technical issues that will be its focus: In the early days of Expert Systems, knowledge engineers were given a very prominent role in interviewing domain experts and then formalising and implementing a knowledge base which captured what they thought an expert had said and/or how they had solved selected problems. This approach is both very expensive and open to many potential communication problems between the domain expert and the knowledge engineer. Role-limiting approaches took a substantial step forward, by viewing domain-level knowledge acquired from an expert by a knowledge acquisition tool as playing certain roles within a domain-independent problem solving method (PSM). Today's experts are avid computer users who can envision the potential of knowledge bases in their fields of expertise. This is clearly the case in many areas of science, where experts are eager and committed to undertake long-term knowledge capture and dissemination to improve their fields and benefit related ones. Many are already developing what can be considered skeletal ontologies and already using them to support relatively simple tasks. There is clear interest in moving towards more sophisticated knowledge bases that support complex problem solving activity. One of the functions of this workshop is to have an in-depth look at the current state-of-the-art and in particular to see how the WWW has changed our thinking about these issues. The issue of whether it is necessary to superimpose a knowledge model on information acquired before it can be used to solve meaningful & novel tasks is one of the points of contention between this workshop & the one with which it is linked, namely, the "Distributed & Collaborative Knowledge Capture" workshop. Additionally, some groups of domain experts, who are now more computationally sophisticated, are asking for tools which will enable them to maintain collaboratively their own Knowledge Bases. Some of the problems of maintaining knowledge bases overlap with acquisition, others are quite distinct. Clearly, this is a very inter-disciplinary activity and we very much hope that the contributions for the workshop will reflect this. These are some of the topics which we hope papers will address; - Systems which have been used by domain experts to develop KBs - Systems which have been used by domain experts to maintain KBs - Techniques to help domain experts visualise and debug their KBs - Detailed requirements from domain experts for the tools they would like to use when developing & maintaining KBs /Ontologies - Detailed case histories of the development of particular topic-specific KBs etc Workshop format: - There will be an opening Invited talk/position paper. - Where possible talks should include appropriate demos. - If there is a need we will also organize a longer demo/posters session. - We plan to have at least one panel - this will be with the Linked Workshop as the grand finale. It is desirable that persons attending this workshop should also attend the linked workshop: "Distributed & Collaborative Knowledge Capture" as this is on a related topic & we are planning at least to have a joint panel as a Grand Finale for the 2 workshops. Co-chairs: Derek Sleeman, Aberdeen, UK dsleeman@csd.abdn.ac.uk Yolanda Gil, ISI, USA gil@isi.edu Members of the programme committee: Pete Clark, (Boeing, USA) Martin Dzbor (OU, UK) John Gennari (Washington, USA) Midori Harris, (EBI, UK) Mark Musen (Stanford, USA) (tbc) Kieron O'Hara (Southampton, UK) Alan Rector (Manchester, UK) Guus Schreiber (Amsterdam, Holland) Submissions We invite short papers, limited to 8 pages, which describe ongoing work or new ideas within the scope of the workshop. Papers may also be in the form of a position statement, indicating a writer's particular opinion on a subject related to the workshop. Submission procedure: Please email submissions, in PDF format only, to dsleeman@csd.abdn.ac.uk before 20th July (2003). Submission format: Please use this Word template . This template is based on the official ACM templates for proceedings. In accordance with requirements of the ACM digital library, please include categories and subject descriptors that best describe your submission. The hierarchy of descriptors can be found here. You may include optional keywords. Note that reviewer assignments will be based on the contents of the abstract, as well as these descriptors and keywords. Accepted papers will be published as part of the KCAP 2003 workshop proceedings. Timetable: May 19, 2003: Publications of Call for Participation July 19, 2003: Papers to be submitted to Chair of Workshop Committee, Derek Sleeman August 19, 2003: Feedback to Authors on submitted papers September 1, 2003: Publication of K-CAP 2003 workshop program September 12, 2003: Revised papers to be submitted to Workshop Chair, Derek Sleeman October 25-26, 2003: K-CAP 2003 workshops Contributions & general queries should be sent to: Derek Sleeman, Department of Computing Science, The University, ABERDEEN AB24 3FX Phone: +44 (0)1224 272296/88 FAX +44 (0)1224 273422 Email: dsleeman@csd.abdn.ac.uk ********************************************************************** I. SOFTWARE I.B.1. From: Sandra Wendland Canoo's Morphology Software now available for Italian Canoo has released the first WMTrans products for Italian. The language technology company recently extended its product range to cover further languages. Canoo now offers morphology software for German, English and Italian. An Italian version of the WMTrans Lemmatizer WMTrans Inflection Analyzer WMTrans Inflection Analyzer / Generator is now available at the product site: http://www.canoo.com/wmtrans Like their German and English counterparts, the Word Manager Transducers (WMTrans) for Italian provide functionality to analyze and generate inflected Italian word forms. WMTrans is based on the Canoo morphological dictionaries, containing: - more than 250'000 lexemes, generating 3 million fully categorized word forms for German, - 50'000 lexemes, generating 115'000 word forms for English, - and 50'000 lexemes, generating 456'000 word forms for Italian. The dictionaries include information on word formation dependencies, all types of morphological irregularities and spelling variants. An added benefit for Italian: WMTrans products process verb and pronoun contractions such as "dimmi", "pensandoci", "vattene", and "credermelo". This is a powerful advantage and increases the number of recognized Italian word forms to 2'425'000. The smart text processing software is used in retrieval and language processing applications. Typical use cases include word stemming, intelligent search, text indexing, text mining, language learning, hyperlink generation, spell checking, grammar checking, and machine translation. The WMTrans Lemmatizer determines the base form of a word and its category. Consider the word form "vattene": The Lemmatizer analyses the word, retrieves its base form, andare, and determines the word category, V (verb), plus details about any contractions. query -> vattene result -> andare (Cat V)(Contraction ti/Pron+ne/Pron+V) The WMTrans Inflection Analyzer processes any word form, and delivers as a result a rich set of useful information on inflection. For example, for the word "spiegatemela", the Inflection Analyzer returns the base form "spiegare" and the word category V (verb), plus further morphosyntactic details such as number or verb tense: query -> spiegatemela result -> spiegare (Cat V)(Aux avere)(Mod Imp)(Pers 2nd) (Num PL)(Contraction mi/Pron+la/Pron+V) WMTrans Inflection Analyzer / Generator offers two function calls. The Analyzer returns morphosyntactic information for an input word such as its base form, word category, gender, case, tense, auxiliary verbs. The Generator delivers a list of all word forms related to a base form. The word forms are followed by a list of morphosyntactic features related to each single form. See the Inflection Analyzer / Generator output for the base form "andare", showing only verb forms in the first person, indicative as set by the filter: query -> andare Filter: (Mod Ind)(Pers 1st) result -> andai (Cat V)(Aux essere)(Mod Ind) (Temp Pass-Rem)(Pers 1st)(Num SG)(ID 0-1) andammo (Cat V)(Aux essere)(Mod Ind) (Temp Pass-Rem)(Pers 1st)(Num PL)(ID 0-1) andavo (Cat V)(Aux essere)(Mod Ind)(Temp Impf) (Pers 1st)(Num SG)(ID 0-1) andavamo (Cat V)(Aux essere)(Mod Ind)(Temp Impf) (Pers 1st)(Num PL)(ID 0-1) vado (Cat V)(Aux essere)(Mod Ind)(Temp Pres) (Pers 1st)(Num SG)(ID 0-1) andiamo (Cat V)(Aux essere)(Mod Ind)(Temp Pres) (Pers 1st)(Num PL)(ID 0-1) andrņ (Cat V)(Aux essere)(Mod Ind)(Temp Fut) (Pers 1st)(Num SG)(ID 0-1) andremo (Cat V)(Aux essere)(Mod Ind)(Temp Fut) (Pers 1st)(Num PL)(ID 0-1) The WMTrans products for Italian are available in Java and run on any platform with Java Runtime Environment (JRE) 1.3 or higher. WMTrans product range allows developers to integrate functions such as word stemming, spell checking, and paradigm generation in their applications. For more information, see http://www.canoo.com/wmtrans/ or contact: Elisabeth Maier Canoo Engineering AG Kirschgartenstr. 7 CH-4051 Basel Tel.: +41 61 228 94 44 mailto:wmtrans-info@canoo.com ********************************************************************** II. JOBS II.1. From: Yves Schabes Teragram has the following two openings in the area of Linguistic Information Retrieval: - Software Developer - Software Engineer Teragram Corporation (http://www.teragram.com), a profitable linguistic information retrieval company located in Boston, Massachusetts, is seeking to hire a software engineer and a software developer to be part of a highly technical software team. The successful candidate must be able to play a important role in the development and maintenance of large-scale linguistic information retrieval technologies. The candidates will be working within a multidisciplinary team of engineers and linguists. Teragram Corporation develops state-of-the art text technologies and applications providing compact and scalable solutions to numerous problems in natural language, information storage, information extraction, full text indexing, text compression, and text correction. Customers include major Internet portals, large media companies, hand held devices manufacturers, and Fortune 5000 companies Job Requirements: The successful candidate will be proficient in developing efficient large C programs under UNIX and have knowledge of information retrieval technologies. B.S. or M.S. in Computer Science from a major University. Experience in computational linguistics, information retrieval, text categorization, text clustering, or project management are a plus. Please reply by email only to jobs@teragram.com ********************************************************************** II.2. From: Yiming Yang Dear Friends, We have an opening for a postdoctoral position in the RADAR (after Radar O'Reilly) project at the Carnegie Mellon University, sponsored by DoD research grant entitled "Enduring, Personalized, Cognitive Assistant" (EPCA). This is a two-year position, starting immediately, under the supervision by Professor Yiming Yang in the Language Technology Institute and the Computer Science Department at the Carnegie Mellon University. The research will be focused on machine learning approaches to intelligent email handling. The technical approaches include hierarchical categorization (on personalized folders), adaptive filtering, message prioritization based on the importance of the topic, the authority of the sender, the urgency of deadlines, the cost of generating an answer, and so forth. We will not focus on spam filtering. We are looking for a new PhD with good text categorization research background and publication records, and strong programming or good system building skills. Interested applicants should email a resume and/or any questions to Professor Yiming Yang (yiming@cs.cmu.edu). ********************************************************************** SIG-IRList Digest is distributed from the University of Sheffield and edited by Stephen Levin (s.levin@sheffield.ac.uk) and Mark Sanderson (m.sanderson@sheffield.ac.uk). To access previous issues or for information on subscribing/unsubscribing and submitting articles, visit: http://www.sigir.org/sigirlist/ These files are not to be sold or used for commercial purposes. Contact Stephen Levin for more information on SIG-IRList. THE OPINIONS EXPRESSED WITHIN THIS DOCUMENT DO NOT REPRESENT THOSE OF THE EDITORS OR THE UNIVERSITY OF SHEFFIELD. AUTHORS ASSUME FULL RESPONSIBILITY FOR THEIR MATERIAL.