A next step towards a qualitative higher information service of a VMM can be achieved by including the ability to extract relevant information from the memory and present it in a condensed form. As far as electronically readable text messages are concerned, one may use information extraction methods, and methods for the summarisation of text, see Spärck-Jones et al. (1993), and Hovy & Radev (1998). Among other things, extraction capabilities also allow prioritising messages on a content basis in domains with well-defined semantics and are in consequence completely task-dependent.
Demonstrations and prototypes of text summarisation systems can be found on the World-Wide Web. For example, during a trial phase in 1998, the British Telecommunications Laboratories (1998) offered a summarisation service with an adjustable compression rate. Such text summarisers are often based on partial analysis of surface syntactic features of the text, dictionary lookups, as well as keyword or thematic phrase identification methods. The Extractor text summarisation software by the Interactive Information Group (1998) handles both English and French text. It can automatically detect the language of the document and process it accordingly. While such tools allow for summarisation with relatively low processing costs, they are usually targeted towards written text but not to transcripts of recorded speech communications (or textual chat) which always include speech recognition errors (or sentence fragments and typos). Also, the compression rate achieved by these systems does not provide much information on the quality of the generated summary. What matters is not the number of selected words but the overall meaning of the selection.
IV. 4 INTERPRETATION AND REASONING
Approaches towards a deeper understanding of the intention behind activities and the meaning of messages usually rely on a knowledge-based processing sometimes mixed with statistical approaches. Thereby, the knowledge base contains semantic facts, such as a taxonomy of domain concepts and assertions that are assumed to hold in a certain well circumscribed domain. In such a case it is possible to parse the message into conceptual structures. Rather than classifying a message content only by the occurrence of included keywords, the messages are now classified with respect to specified semantic frames, including concepts and the semantic relations between the words expressing concepts. As opposed to keyword processing, the user may query messages by specifying the semantic concept. For example, suppose a user wants to recall all messages in which they were talking about accommodations in London. Though not explicitly specified in the query, the user will get the messages about hotels, private rooms, and youth hostels as they are all semantically subsumed by the concept “accommodations”.
To put this further, one may think of a sophisticated VMM that is able to draw inferences on the explicitly available memory contents. For example, using methods for the recognition of so-called speech acts and dialogue acts, this version of a VMM could infer higher-level goals and plans of the message producer. Many different taxonomies or ontologies of speech acts have been defined in a number of NLP projects, e.g., see Traum & Allen (1992), Allen et al. (1995), Alexandersson et al (1998). Speech act recognition could be useful to provide pragmatic information about the function of utterances and is a key process in automated dialogue understanding, since it makes explicit the speaker’s intentions and plans and is not based on a general semantic representation, see Allen (1983). Speech acts were extended to communicative actions integrating the conversational context as in Bunt (1989). But the notion of context is a broad notion and the possible intentions of the speaker may be infinite. For this reason such systems are being used only in very restricted application domains. For instance, for train time table information services, such as in the Mask (Multimodal-Multimedia Automated Service Kiosk) project e.g. see Minker, Bennacef, & Gauvain (1996), the semantic frames contain information on train arrival/departure times and categories. In appointment scheduling dialogues such as used in the Verbmobil project, the content of their Dialogue Memory is built from the ongoing interaction. In both projects an application dependent semantic representation allows the system to identify user’s dialogue acts and to control and predict subsequent exchanges, see Alexandersson et al (1998). Also, the speech acts are directly related to specific domain concepts (time schedule / price confirmation in Mask or date / location / duration negotiation in Verbmobil, for example). Reliable solutions for dialogue understanding are therefore limited to task-oriented communication, and are not easily portable from one application to another. To this respect, Winograd (1988) has still not proven wrong when arguing that it is impossible to automatically recognise speech acts in complex human-human conversations. To circumvent the difficulties of automatic speech act recognition, he proposed a system (The Co-Ordinator) in which the user has to explicitly indicate what sort of speech act he/she intends to convey. The user does this by attaching an appropriate speech act label to all his/her utterances. The system constructs some “communicative dances” for the sequences of possible speech acts and allows the user to get information about the state of conversation by exploring the links between the speech acts. This perspective (known as the "Language/Action Perspective") has been developed in some research projects and is being applied in industrial systems, e.g., see Agostini, De Michelis, Patriarca, & Tinini (1994), De Michelis & Grasso (1994). On the other hand, the Language/Action Perspective has triggered some surprisingly negative reactions from the social sciences community. We refer to Suchman (1987) and Suchman (1993), as well as to Winograd (1994) for a response to these criticisms. Imposing the user to explicitly and systematically label his/her messages may indeed appear quite compelling. But the user may appreciate to choose an item in a short closed list of actions (ask, reply, confirm etc.), rather than type a complete text. Some other actions may be inferred by the system when detecting non-ambiguous thematic phases (I decide, I intend to do, as aconclusion), or through the agenda topics. Identifying the type of a speech act is also a matter of the interface design. For example, consider a chat tool where instead of pressing a “send” button after the utterance has been typed in, the user is provided with several different buttons that represent particular speech acts, such as “ask”, “reply”, “inform”, “confirm”, “reject”. Sending the utterance off by pressing the “ask” button classifies the utterance as a request and yields a corresponding memory entry.