Eurolan 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY Its Potential and Practicalities July 28 - August 8, Bucharest - Romania Invited Lectures The webcast was sponsored by Complexity Digest Contributors: Gottfried J. Mayer, Matus Marko. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Thierry Declerck & Adrian Raschip Thierry Declerck is a Senior Consultant at the Language Technology Laboratory, DFKI. He is also member of the Computational Linguistic Department of the Saarland University, Germany Adrian Raschip is a Software engineer at the Language Technology Laboratory, DFKI. The Multilingual Semantic Web The tutorial will give a presentation on how to integrate linguistic and semantic annotations generated by multilingual language technology tools using knowledge encoded in different domain ontologies. The result of this integration of annotations, the knowledge markup of textual documents, supports also the efforts in the WWW community aiming at upgrading the actual web into the semantic web. Those efforts build the focus of the EU research project "Esperonto" (see www.esperonto.net), which will serve as the base for the tutorial. First, we will briefly present the main (multilingual) linguistic analysis steps that are involved in this kind of work: tokenisation, morphological analysis, part-of-speech tagging, chunking, reference resolution and dependency structure analysis. Then we give an overview on semantic tagging, which combined with the distinct levels of linguistic annotation gives the basis for knowledge markup. In this part of the tutorial we will discuss the issue of standard XML based representation formalims for ensuring the optimal integration of NLP based annotations with knowledge encoded in ontologies. Documents annotated in this way allow direct access from (multilingual) texts to ontologies, and the tutorial will, in a second part, present ways for organising and visualising the content-based navigation through a collection of documents processed the way described in the first session. The focus here will be the topic of the automatic content-based hyperlinking of web documents. And since the knowledge markup of documents can also probably play a role for ontology development, i.e. the dynamic adaptation of ontologies to evolving applications and domains, we will present in the third part of the tutorial actual work (mainly done at the Polytechnic University of Madrid) on ontology learning from linguistically annotated multilingual documents. In a special session, we will discuss the issue of proposing a standardised strategy for encoding language resources with metadata conformant to the Semantic Web, making them thus available for Semantic Web Annotation services. This session will introduce the notion of Web Services in general and will be based for the specific language resource part on first results of the EU project INTERA. Requirements from the students: basic knowledge in NLP (including semantics), in HTML and XML. Programming languages like Perl, Java and Java Script are needed for the working sessions. In the working session, teams of students will propose an implementation of the strategy described above. The first session will cover the use of NLP tools. The second session will cover the access to ontologies from texts and the knowledge based annotations of documents properly speaking. A third working session will be dedicated to the encoding of language resources with metadata conformant to the semantic web. Basic knowledge in XML, RDF, SOAP are a plus. * Language Technology and the Semantic Web, Thierry Declerck, 2003/7/31, Audio (mp3). James Hendler Jim Hendler is a Professor at the University of Maryland and the Director of Semantic Web and Agent Technology at the Maryland Information and Network Dynamics Laboratory. He has joint appointments in the Department of Computer Science, the Institute for Advanced Computer Studies and the Institute for Systems Research, and he is also an affiliate of the Electrical Engineering Department. Jim was the recipient of a 1995 Fulbright Foundation Fellowship, is a member of the US Air Force Science Advisory Board, and is a Fellow of the American Association for Artificial Intelligence. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA), and is a prominent player in the World Wide Web Consortium's Semantic Web Activity. The Semantic Web: What it is and how it works This day will be spent learning about the Semantic Web, under the guidance of one of its originators and leading proponents. The session will start with a presentation of the Semantic Web vision, and a discussion of how it is different from some other current web technologies. The primary focus of the day will be the new Web Ontology Language OWL, currently being developed under the aegis of the World Wide Web Consortium. OWL provides a number of useful features for representing information on the Web, and for linking that information to web resources of multiple kinds. OWL is based on XML syntax, the RDF model, and the RDFS vocabulary language, so these languages will also be presented. Exercises will involve learning about OWL and how to use it, and with using some of the RDF and OWL tool kit. Students will also learn about a number of online resources for using the Web Ontology language, including a library of ontologies, a listing of over a hundred useful tools, and many others. * The Semantic Web, James Hendler, 2003/8/6, Audio (mp3) * A Quick Introduction to OWL Web Ontology Language, James Hendler, 2003/8/6, Audio (mp3) * OWL Web Ontology Language, James Hendler, 2003/8/6, Audio (mp3) Jerry Hobbs Jerry Hobbs is a member of the Natural Language group at the Information Sciences Institute, University of Southern California, School of Engineering. Hobbs came to ISI in September 2002 from the Artificial Intelligence Center at SRI International in Menlo Park, bringing with him several projects. One, funded by the Defense Advanced Research Projects Agency, is part of the DARPA Agent Markup Language program, known by the acronym DAML, which is developing what is called the "semantic web." 24 January this year, the venerable Swedish institution, annual site of the Nobel Prize ceremony, the Uppsala University, bestowed the title of Doctor of Philosophy honoris causa to recognize Hobbs for his three decades of work in artificial intelligence and computational linguistics. "I have ... climbed the Matterhorn, drove a Land Rover from London to Capetown, got attacked by a thousand people in Egypt, got stuck in quicksand in the interior of Iceland, flew in a Russian cargo plane to Timbuktu, followed orangutans around the Borneo rain forest, narrowly avoided being kidnapped in Yemen, etc.," he wrote recently. Ontologies for the Semantic Web Broadly shared ontologies are important for the success of the Semantic Web. It won't help to have only one style of data structure if the concepts expressed in these data structures do not align. In my lectures I will describe several efforts to develop ontologies for very basic and very important concepts. Services and Events: OWL-S (formerly DAML-S) is an ontology that has been developed for representing Web services. A profile provides a coarse-grained description that allows Web agents to find and compose several different Web services. A process model allows one to decompose a Web service and use only parts of its intended functionality. For example, one can use Amazon to get references. In another effort, an Event Representation Language (ERL) is being developed for describing primitive and complex actions and events in the annotation of video data. This involves a process model that is similar to that in OWL-S. Time: DAML-Time is an ontology of time that has been developed in the past year. It covers the topological properties of time, such as the "before" relation and the relations in Allen's interval calculus; measures of duration; clock and calendar concepts; temporal granularity; and temporal aggregates. A treatment of deictic time is under development. Space: A similar effort on an ontology of spatial properties and relations is just beginning. It is intended to cover the areas that correspond to the areas covered by DAML-Time, with additions due to the multidimensionality of space. Specifically, it will cover topological properties as represented in the region connection calculus RCC-8; dimensionality and frames of reference; shape and orientation; measures of length, area, and volume; longitude, lattitude and altitude; and geographical and political regions and subregions. Information Structure and Commonsense Psychology: Another ontology-building effort we are involved in concerns, first of all, the structure of information as exhibited in symbolic systems of various sorts, including language, diagrams, documents, Web pages, and face-to-face conversation. Some issues are the meanings of atomic elements, how elements compose into complex meanings, and coreference relations among elements. Annotation of these things in Web pages, for example, could lead to more accurate searches for images and diagrams. An ontology of information structure should be grounded in an ontology of commonsense psychology, and this is something we are also developing. It is intended to cover such concepts as memory, belief, envisioning, planning, goals, similarity judgments, and so on. Natural language is the best guide for what concepts need to be covered in these basic ontologies, and what inferential relations there are between the concepts. The only prerequisite is that your eyes do not glaze over when you see formulas in first-order predicate calculus. * Ontologies for the Semantic Web:Services and Events, Jerry R. Hobbs, 2003/8/8, Audio (mp3) * Ontologies for the Semantic Web: Time and Space, Jerry R. Hobbs, 2003/8/8, Audio (mp3) * Ontologies of Information Structure and Commonsense Psychology, Jerry R. Hobbs, 2003/8/8, Audio (mp3) Nancy Ide Nancy Ide is a Professor at Vassar College, USA. and Chair of the Department of Computer Science. She is Editor-in-Chief for Computers and the Humanities, book series co-editor for Text, Speech, and Language Tecnhology, (Kluwer Academic Publishers), founder of the Text Encoding Initiative (TEI), and developer of the Corpus Encoding Standard (XCES). She is currently working in the International Standards Organization (ISO) committee to develop standards for language resources and annotation. She has done research in various areas of computational linguistics, including word sense disambiguation and discourse analysis. Nancy has co-directed the last three editions of the Eurolan Summer School. The Resource Definition Format (RDF) The Resource Definition Format (RDF) is an XML-based format that enables adding a layer of semantics to XML encoded data. RDF allows for the creation of classes and subclasses of XML-encoded objects with specified properties, which in turn serve as named links (relations) between objects. As such, RDF adds a layer of semantics to XML-encoded data, which is further enhanced by linkage to ontological information specified via the Ontology Web Language. As such, RDF+OWL provide a powerful means to represent linguistic information in a standardized format that can be referenced by annotations to language resources, as well as a means to specify relations between the data and XML-instantiated annotations. The first part of this full-day EUROLAN session will consist of an introduction to RDF and a tutorial overview. In addition, the application of RDF to the representation of language resources and their annotations will be discussed. The afternoon session will comprise a hands-on exercise in encoding data using RDF, as well as manipulating and accessing the data using RDF-aware tools. * Resource Definition Format, A Tutorial, Nancy Ide, 2003/7/29 * RDF Examples, Nancy Ide, 2003/7/29 Alessandro Lenci Researcher at the Università di Pisa, Dept. of Linguistics. 1991 Degree in Philosophy, University of Pisa; 1999 PhD in General Linguistics, Scuola Normale Superiore (Pisa). In 1996/97, Professor of Italian Linguistics at the University of Helsinki. He has published various contributions on lexical semantics, philosophy of language, cognitive science and Natural Language Processing. His research in computational linguistics mainly concerns computational semantics, lexical acquisition and representation, linguistic ontologies, ontology learning, dependency-based parsing. He has been involved in the EU projects LE-SIMPLE, ELSE, MUSI and in the ISLE Computational Lexicon Working Group. Computational Lexicons and the Semantic Web In order to make the Semantic Web a reality, it is necessary to tackle the twofold challenge of content availability and multilinguality. This in turn implies fostering the way information in natural language documents is identified, extracted and explicitly represented in such a way to become accessible by software agents. A natural convergence thus exists between the Semantic Web long-term goals and some of the core activities in the field of Human Language Technology (HLT). In HLT, the task of providing the basic semantic description of words is entrusted to computational lexicons, which aim at making word content machine-understandable. That is to say, they provide an explicit representation of word meaning, so that it can be directly accessed and used by computational agents. Existing lexical resources represent the ideal starting point to obtain the required mapping of language onto knowledge (and vice versa). The goal of this tutorial is to show and explore the virtuous circle existing between computational lexicon technology and the semantic web. The latter actually provides the necessary means to turn existing lexical resources into effective technology for semantic web processing. The course is organized in two parts, each tackling one of the two directions of this tight connection: 1. we will explore the main features of existing computational lexicons, in view of their relevance for the Semantic Web. In particular we will focus on the relationship between ontologies and lexical resources, arguing for their convergence towards common repositories of concepts together with their manifold linguistic and conceptual relations; 2. we will also show how semantic web methodologies (use of knowledge markup/metadata) and standards (RDFS/OWL) can be used for web-based, standardized lexical resources, allowing for a distributed and widespread use of these resources in HLT-based semantic web applications; In the afternoon sessions, some practical case-studies will be proposed to the students, with the purpose of analyzing the content and structure of the major types of lexical resources and experimenting the application of Semantic Web methodologies for the formalization and interchange of lexical data. Prerequisites: 1. basic knowledge of NLP and lexical semantics; 2. knowledge of XML; 3. basic knowledge of RDF (preferred) * Computational Lexicons and the Semantic Web, Alessandro Lenci, 2003/7/30, Audio (mp3) * Computational Lexicons and the Semantic Web, Training Session, Alessandro Lenci, 2003/7/30 Robert Meersman, Jan De Bo & Peter Spyns Robert Meersman is a Professor at the Vrije Universiteit from Brussel, Belgium. He held chairs and founded the InfoLabs at University of Limburg (Belgium, 1983-86) and at University of Tilburg (The Netherlands, 1986-95). He is member and Past Chairman (1983-1992) of the IFIP WG2.6 on Database, Past Chairman of the IFIP TC 12 (Artificial Intelligence, 1987-92), Co-Founder and current President of the International Foundation for Cooperative Information Systems (IFCIS, since 1994) and of the Distributed Objects Applications Institute (DOA, since 2000). He founded and is director of the Systems Technology and Applications Research Laboratory (STAR Lab) at VUB in 1995. His current scientific interests include ontologies, database semantics, domain and database modeling, interoperability and use of databases in applications such as enterprise knowledge management and the Semantic Web. Jan De Bo is a researcher at STAR Lab, VUB. Currently he is working on the alignment and merging of ontologies. Peter Spyns is working as a senior researcher at the Sciences Faculty of the Free University of Brussels (Dutch speaking section). Ontology design and development Ontologies are increasingly essential for computer science appications, and their use is not limited to the Semantic Web of course; organizations are discovering them as useful machine-processable semantics for many application areas, such as digital libraries, natural language processing, e-commerce, e-learning, knowledge management, integration of information systems, etc. An ontology, generally, is an agreed understanding (i.e. semantics) of a certain domain, represented formally as a computer resource. By sharing an ontology, autonomous and distributed applications can meaningfully interoperate although they were not explicitly designed to do so. Nevertheless, experience shows that unscalable solutions emerging from academic research may fail at the industrial level. E.g. compare the meager success of deductive database management systems with that of relational database management systems, while the former are arguably more powerful, elegant, etc. Not only scalability is important, but also other ontology engineering principles, such as: the reusability of ontologies among different kinds of applications and domains, the simplification and acceleration of the ontology building process, the easy and efficient tools support In this course, we will present an ontology engineering approach, called DOGMA, partly inspired by the success of the relational database principles and practice. This course is divided into four sessions: The first session covers what an ontology is, ontology engineering and design principles, applications scenarios, and future trends. In the second session, we will present the DOGMA methodology for developing ontologies, and we will explore how conceptual modeling methods can be (re)used in the development of ontologies. Attendants will learn how to build ontologies using a derived form of ORM conceptual modeling. Finally in two hands-on sessions, in which attendants will practice building actual demo ontologies. In the first part we introduce some of the DOGMA tools (DogmaModeler ontology engineering tool, and DogmaServer ontology server) and define a case study project. Solutions are discussed in the second part with different design options. * Ontology Engineering/ Guided Mediation for Agents (DOGMA) framework, Robert A. Meersman, 2003/8/1, Audio (mp3) * The DOGMA Modeller manual Srini Narayanan After completing a Ph.D. in Computer Science at the University of California, Berkeley for a brief period Srini Narayanan has joined the Artificial Intelligence Lab at SRI International. Actually he is with the International Computer Science Institute. His research interests are: probabilistic models of language interpretation, graphical models of stochastic grammars, semantics of linguistic aspect, on-line metaphor interpretation, embodied rationality, semantic web. Frame Semantics for the Web As the semantic web grows and applications mature and scale up, there is an increasing need to represent lexical and sense distinctions in a structured, machine interpretable manner. In this course, we will survey the range of lexical semantic resources available with specific focus on issues and requirements to adapt these resources to applications on the semantic web. These desiderata will be matched up with the capabilities and limitations of the currently available markup languages and tools on the semantic web including RDF, DAML+OIL, DAML-S and OWL. Both the tutorial and the practical session will pay special attention to FrameNet (http://www.icsi.berkeley.edu/~framenet), a rich, wide-coverage lexical semantic resource that documents the range of semantic and syntactic combinatory possibilities (valences) of each word by linking word senses to underlying conceptual structures or frames. The frames are then used to manually annotate example sentences, and to automatically summarize the resulting annotations. The tutorial session will provide the students with the DAML+OIL and DAML-S translation of the FrameNet database. Students will gain hands-on experience in translating FrameNet frames and text annotations into the DAML framework. Participants will be expected to come up with techniques to use the DAML version of the frame database for applications (such as Semantic Extraction, Machine Translation or Question Answering), and evaluate if and how open questions outlined in the course are addressed in their approach. * FrameNet Meets the Semantic Web, Srini Narayanan, 2003/8/6, Audio (mp3) * Semantic Web Services, Srini Narayanan, 2003/8/6, Audio (mp3) Sergei Nirenburg & Marjorie McShane Sergei has done impressive work in Natural Language Processing, Artificial Intelligence, Knowledge-Based Systems, Machine Translation, Ontological Semantics, Computational Linguistics. A former senior research computer scientist at the School of Computer Science, Carnegie Mellon University and a professor and researcher at the Department of Computer Science, New Mexico State University, Sergei Nirenburg is actually a professor at the University of Maryland, Baltimore. Marjorie McShane is a Research Assistant Professor at the Department of Computer Science and Electrical Engineering, UMBC. * The meaning of language expressions, Sergei Niremburg, 2003/8/4, Audio (mp3) * Ontological Semantics, Sergei Niremburg, Audio (mp3) Automatic Semantic Tagging Hans Uszkoreit He is a professor of computational linguistics at the Saarland University, Saarbrücken, Germany and the Scientific Director at the German Research Center for Artificial Intelligence (DFKI). He is also involved in two young language technology enterprises as co-founder and advisor. * Semantic Annotation and Hyperlinking for Associative Digital Memories , Hans Uszkoreit, 2003/8/2, Audio (mp3). Wolfgang Wahlster Dr. Wolfgang Wahlster is the Director and CEO of the German Research Center for Artificial Intelligence (DFKI GmbH) and a Professor of Computer Science at the Universität des Saarlandes, Saarbrücken. He was the Scientific Director of the VERBMOBIL consortium on spontaneous speech translation (1993-2000) and currently serves as the Scientific Director of the SmartKom consortium on multimodal dialog systems (1999-2003). He has served as the Chair of ECCAI, the European Coordinating Committee for Artificial Intelligence, from 1996-2000. In 2000, he was the President of the Association for Computational Linguistics (ACL) and the first AI researcher to receive the Beckurts Award, one of Germany's most prestigious awards for scientific and technological innovations. In 2002, he was the second German computer scientist elected Full Member of the German Academy of Sciences and Literature, Mainz. * Semantic Web Technologies for Multimodal User Interfaces, Wolfgang Wahlster, 2003/7/29, Audio (mp3) Yorick Wilks, Christopher Brewster & Alexei Dingli Yorick is the head of the Natural Language Processing Research Group in the Department of Computer Science at the University of Sheffield. Current institutional interests include the International Committee on Computational Linguistics (ICCL), The Institute for Language, Speech and Hearing (ILASH), and Computational Linguistics UK (CLUK). His recent interest include Computational pragmatics, including the ViewGen belief manipulation system and the CONVERSE dialogue system, Computational Lexicon research, including the notion of lexical tuning, and novel methods for accessing and distributing resources within an architecture like GATE, and Information extraction, in an evaluated framework so as to compare methods, such as pattern matching and full parsing. He is often seen playing in the Sheffield University Drama Society. Christopher Brewster and Alexei Dingli are Research Assistants for the EPSRC funded Advanced Knowledge Technologies project in the Department of Computer Science, University of Sheffield, UK. Lectures on Ontology The lectures will cover the following themes: 1. What is an ontology really and what is all the fuss about? 2. What is the empirical role of ontology like items in practical NLP and Knowledge Management tasks? 3. Can ontologies of the sort used for NLP be constructed automatically? 4. Are ontological questions dependent on individual models of the world? 5. To what extent is there a real underlying "ontological question" for computational, knowledge-based, systems? * Ontologies and Reality: Practical and Philosophical Issues, Christopher Brewster, 2003/8/6, Audio (mp3) * Automating Ontology Building: Ontologies for the Semantic Web and Knowledge Management , Christopher Brewster, 2003/8/6, Audio (mp3) * Knowledge Maintenance and the Frame Problem, Christopher Brewster, 2003/8/6 * Populating Ontologies for the Semantic Web, Alexiei Dingli, 2003/8/6 * Ontotherapy: or how to stop worrying about what there is, Yorick Wilks, 2003/8/6 Main ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ Up Eurolan 2003