IV. FEEDING THE VIRTUAL MEETING MEMORY WITH STRUCTURED ENTRIES:
In what follows, we describe in more detail how memory contents emerge from capturing data such as text and audio streams that are exchanged among the participants in the Magic Lounge. We suggest that memory contents comprise elements of different levels of complexity, as they may range from events recording up to context-sensitive interpretations of messages. Figure 2 illustrates this view by a pyramid. The figure also indicates a co-relation between the degree of abstraction and the technical difficulty to implement a corresponding processing method. For each level of complexity, we will: (a) sketch what technology is required in order to equip virtual meeting places with those memory functions; (b) give an idea on what is achievable with current technology; (c) what is subject to further research.
Figure 2: Co-relation between the type of memory entries and the technical difficulty to create such entries automatically from recorded data. Figure 2 : Relations mutuelles entre, d’une part, les différents niveaux de complexité (enregistrement et stockage, regroupement, filtrage et tri, extraction de connaissances et résumé, interprétation et raisonnement) de la mémoire et, d’autre part, les difficultés techniques croissantes posées par la création automatique de ces différents points de vue à partir de données brutes enregistrées. IV.1 RECORDING AND STORAGE
To a large extent, all memory entries, as complex or abstract they eventually are, will be based on recorded media streams and captured events coming from one of the communication tools. Thereby, the term event is used in a broad sense to cover:
-
all interactions of a user with the interface she/he uses to enter the virtual meeting place,
-
all interactions of the virtual meeting place with external components such as information sources.
By interaction we mean all types of (physical as well as linguistic) actions. In its most basic version, a VMM will just contain events, which have been detected during a meeting or a series of meetings. Each chunk of information that passes through a virtual meeting place can be recorded with current technology. The availability of various compression algorithms for audio and video data, e.g., see Rao & Hwang (1996), as well as multimedia database technology even allows to cope with mass data that emerge when recording multi-party speech and video conversations, e.g., see Fong & Siu (1997), and Alpers, Blanken, & Houtsma (1997). Therefore, the development of a virtual meeting place with sufficient memory capacity in terms of storage and access speed is more a question of the available budget rather than a technological challenge. Two different access interfaces to such an "event recorder" memory for the Magic Lounge are shown in Figure 3. On a PC, the event records are displayed in a table. Because of limited screen real estate, a more compact list presentation appears on the phone display. Using scroll buttons, the user can navigate through the list of recorded events. Furthermore, the single records are labelled with numbers to allow the user to retrieve more details about them by pressing the corresponding number keys on the phone.
Figure 3: Access via a window-based interface on a PC or mobile phone with a tiny LCD display to events recorded in the Magic Lounge VMM.
Figure 3 : Accès aux événements enregistrés dans la Mémoire Virtuelle des Réunions (VMM) de Magic Lounge via une interface PC avec des fenêtres ou via un téléphone mobile à affichage à cristaux liquides (LCD).
IV.2 GROUPING, INDEXING, AND LINKING
The larger the amount of recorded data, the higher becomes the need to structure the data in a way that allows efficient searching and retrieval of relevant memory contents. Thereby, the attribute “relevant” relates to the fact that a query to the VMM is always context dependent. The VMM has to satisfy the system or user's needs for information in the current context. A first step beyond the mere collection of data is to group (or classify) the recorded events according to a certain criteria. Such criteria are, for example, the sender and addressee(s) of a message, the type of media and the data format in which a message is encoded, the date of occurrence, the length and so forth. To group events according to such content-independent criteria may require only a look-up because this information is usually explicitly part of the entries in a VMM. Other classification criteria, such as the topic of a message, or the communicative intention behind a message do require much deeper processing of the message content. We will come back on this issue in next subsections. An interesting aspect of classifying events is that a classification oftentimes imposes a hierarchical structure. Classifications can also serve for a prioritisation of messages according to a user’s specific preferences. This is an important feature as for different users some messages are certainly more important than others are.
Indexing is a first step to retrieve relevant messages in view of their content. As far as the message content is represented in text, full text searching can be applied to the indexed messages similar to the style of search provided by World-Wide Web search engines. Thereby, indexing can be based on statistical methods, such as the relative-frequency approach to measuring the significance of words, word groups, and sentences, e.g., see Spärck-Jones (1997). For the case of text, there are several indexing tools available for free download over the Internet, such as the SWISH system by Hughes (1995). To get some hints on what a spoken message is about, one may rely on methods that have been developed in the area of audio retrieval, such as fixed vocabulary word spotting as described in Foote (1999). There are also approaches for the indexing of visual media such as graphics and video, e.g., see Ahanger & Little (1996), and Maybury (1997). A basic approach to this task relies on the extraction of a few discriminating features from each media object. But coping with background noise and spontaneous speech disfluencies, dealing with multiple topics or speakers still exceed the capabilities of most available systems. This constitutes an area of ongoing research and one of the main objectives addressed within several European projects, e.g., see de Jong (1998), and the on-going ARPA evaluation campaigns (TDT, Topic Detection and Tracking), e.g., see the Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998).
The goal of event linking is to provide associative access to the contents of the VMM. Links are internal pointers from event structures to other entries of the VMM. Interweaving the contents of the VMM this way can become the basis of a hypertext-style user interface to the VMM. A difficult task, however, is to decide which information units should be linked which another. The so-called “activity-based information retrieval” approach suggests to group memory entries that emerge from a particular situation into episodic units, see Lamming & Newman (1992) and Lamming & Flynn (1994). A basic assumption behind this approach is that activities are co-related by the spatial and temporal context in which they occur. For instance, remembering that someone said “I found a cheap hotel close to the station” becomes much more specific when it also remembered that the speaker was consulting a tourist map of a certain town or neighbourhood at the time the utterance was made.
Share with your friends: |