Abstract: - A model for virtual entity representation in mixed reality is proposed. Virtual entities are actors in this model, which enable real, virtual, natural and artificial objects to interact according to a common ontology. The first part of the paper deals with architectural issues, including a three layer stack for semantic, middleware and physical interaction protocols. It also discusses some management mechanisms to allow entities to be created and perform semantic cooperation. The second part discusses a virtual entity FIPA agent implementation, which has been carried out by means of the JADE platform. Finally, as a case study on interactions between tourists and artifacts in a heritage site, the paper discusses a simple spot of a HAREM application design.
Key-words - ubiquitous computing, augmented reality, mixed reality, ontology, multi-agent systems
Augmented Reality (AR) can be considered the key paradigm of ubiquitous computing. Its effectiveness can actually help the transition from the today vision of internet information society to one built on pervasive computing.
AR introduces a new life style which need to be aware of the surrounding environment as something richer than the old natural reality. Even if AR should not stand as something artificial, it is difficult to be unaware of today reality as a relation space in which interactions may be artificial. We can guess that in a few years, many of things we can see or touch will have some artificial logic attached.
An AR system has to manage two basic classes of resources: physical and logical. Physical resources can be natural beings or things, as well as digital devices working in a real landscape; logical resources are software objects within a hidden network [WONT 2002] and interact according to some distributed paradigm.
Physical and virtual resources can be part of autonomous entities which live in augmented reality and interact, cooperate, join groups, fund societies. Groups, societies, and whatever else aggregation instance become autonomous entities as well, which in turn can interact with other entities.
According to a traditional vision of reality, AR entities could therefore be natural, artificial, real, virtual, or mixed. However, when two entities interact in AR, they need to use some common ontology and a peer to peer protocol. This is really hard to arrange when entities are heterogeneous: Consider a visitor who is looking at a statue. The visitor is a complex entity which uses its eyes to look at a physical statue. This way, the actual interaction is performed on a homogeneous physical layer. What the visitor perceives of the statue is a result of some process running in a higher layer organ of the visitor entity. In practice, visitor and statue cannot use a peer to peer protocol to interact between them directly. In our case the visitor uses some internal vertical protocol to get a service from its eyes.
We want to use this approach to design virtual entities in mixed reality (MR) systems, thus letting real physical beings or things be parts of virtual entities.
When dealing with AR software systems, entities are software frames [BAKER 1998] to be implemented according to some functional stack in which software objects always are one layer higher than physical objects at least.
2 Related works
Mixed reality has been investigated in last years with the main goal of achieving interaction rules among real and artificial objects involved in some computer application. Some contributions are available dealing with the hypermedia paradigm. [ROMERO 2003] proposes an object-oriented framework on a hypertext Reference Model [HALASZ 1994] and a hypermedia data model in which information data are represented by atomic components. The aim of the hypermedia model there is to integrate the physical world and virtual documents and worlds. In [Grønbæk 2003] a tagging system is proposed based on three main categories: object, collectional and tool. This allows the authors to discuss empirical studies on collectional artifacts to be used in a specific work setting of landscape architects. Mixed reality is often investigated in museum applications. In [HALL 2002] the SHAPE consortium in Sweden is presented mostly dealing with disappearing hardware and augmented reality topics. The authors discuss how a virtual archaeologist can explore a museum along with virtual history outdoors and hybrid physical-digital artifacts.
Most of mixed reality system are based on some location awareness model as in [Duri 2001] and in the Cyberguide project [LONG 1996]. Among others the Archeoguide project [VLAHAKIS 2001] needs to be mentioned as an augmented reality application with the aim of a VRML reconstruction of Olympia archaeological sites in Greece.
There is also a wide literature to be read on ubiquitous computing starting from the first vision of Mark Weiser [WEISER 91] to the proceedings of the recent Ubicomp2003 conference in which some attractive advanced augmented reality applications were presented.
3 The HAREM Model
Differently to the above mentioned projects, the HAREM model faces the mixed reality problem from an ontological point of view. All objects in augmented reality are parts of virtual entities whose semantic abstractions wrap physical and virtual resources. An object in AR is not only what can be read of it in a vocabulary; an object can become what other entities need to perceive of it, including semantic contents and actions not naturally belonging to that object.
The HAREM model aims to be a reference structure for any entity representation and interaction in mixed reality. This is based on a three layer architecture each hosting a different entity projection:
- a semantic projection for semantic interaction, knowledge maintenance and knowledge management,
- a middleware projection allowing entities to be implemented according to some development platform
- a physical projection for physical interaction and physical resource management.
Semantic and middleware projections are the non-visible part of an AR entity. This includes a knowledge base and a collection of methods, along with an overall execution logic which implements knowledge processing and method activation.
Physical projection accounts for physical resource management in AR visible part. It acts through a sequence of exposition - perception cycles and allows an entity to interact with physical projections of other entities by means of multimedia devices.
Each projection layer was conceived to work in a multithread execution environment, thus letting entity multi-projections interact with many other entities at a time.
3.1 Semantic projection
The semantic projection is split in a set of sub-layers, each defining a behavior semantics of an entity (Fig. 1). Each sub-layer is implemented according to a common ontology to be mapped on a given middleware structure thus enabling an entity to the use of a peer to peer protocol for semantic interaction. The main semantic projection sub-layers are:
Maintenance: entity creation, suppression and update;
Consistency: entity features and reason of being. At no time an entity can accomplish a task or a cooperation request if it does not comply with the knowledge rules in its consistency layer. As an implementation note, the consistency sub-layer can include the setup parameter of an entity according a specific entity class;
Vocation: logic selection and coordination;
Role: permanent knowledge for mission accomplishment;
Task: transient knowledge for mission accomplishment;
Ability: access rules to a knowledge base on request from other sub-layers or other entities calling for cooperation; At this moment the ability sub-layer also includes strategies for quality of service and performance improvement;
Survival: knowledge, rules an methods dealing with security and fault tolerance issues.
Instinct: default reaction behavior of an entity, to be called by the sub-layer selection logic.
First time, after its creation, sub-layers are started according to a top-down schedule. Next sub-layer selection depends on incoming messages types. In some cases a sub-layers can call another sub-layer directly; in other cases vocation sub-layer provides the correct schedule according to the mission to be pursued.
3.2 Middleware projection
Any middleware platform could be used to implement the HAREM structure (Fig. 2). Nevertheless, semantic projection was conceived having in mind the FIPA standard and its intelligent agent internal structure. As it will be discussed in the implementation notes, the HAREM code is being developed using Java within the JADE platform. HAREM is therefore FIPA compliant. Its internal structure can be considered as an extension of the FIPA agent.
Generally speaking, the implementation of the middleware projection gives rise to some grid computing functionalities for complex distributed service providing.
3.3 Physical projection
Physical interaction between entities in AR is performed at the lowest layer, where I/O device drivers and physical interaction logic take place.
A physical interaction consists of a sequence of interaction cycles in which a physical exhibition step is followed by a physical perception step, as shown in Fig. 3 and described in the following procedure:
We suppose AR resources are to be shared among entities. Therefore, an entity can be created without a full set of knowledge and methods, if these can be supplied by other entities somewhere in AR. Entities can even be conceived, whose logical resources are pure knowledge made up of a simple rule and no methods. We call these entities semantic cells. A semantic cell is unable to activate procedural executions itself, since it is not provided with any method. Nevertheless, when a procedural execution is required for task accomplishment, a semantic cell can invoke a cooperation mechanism to ask for help from other entities.
An entity or a semantic cell which cannot find a resource in its semantic projection, is said to show a semantic gap which needs to be filled.
In the following we discuss a cooperation mechanism to discovery and share knowledge and methods among entities. The mechanism must be efficient, fault tolerant, and capable of dynamically adapting itself to AR environment changes. This mechanism must also include a strategy for resource discovery, thus allowing entities, semantic cells, and methods to be bound in a semantic process. This also ensures AR redundancy reduction.
An AR entity usually undertakes the execution of a method as a result of some reasoning process. This led us to represent entity knowledge in a hybrid declarative-procedural fashion. Entity knowledge is represented by a sequence of clauses, each establishing a relationship among terms. Procedural logic is linked to knowledge base by attaching methods to some rule predicates. Execution of methods is automatically started when a corresponding predicate is verified. However, a semantic gap may occur when a rule, or a method, cannot be found in an entity knowledge base. We can distinguish two cases:
1) the expansion rules and methods are not available in the whole AR system.
2) the expansion rules and methods are owned by other entities.
In the first case, knowledge and methods should be externally provided to the lacking entity (LE). In the second case, a cooperation can start with an entity which is supposed to contain the requested knowledge and methods (hereafter FE "Friend Entity"). Then, predicate verification goes on in the knowledge base of the FE. Each entity in our model is equipped with an FE Table (FET): a set of couples
, which allows unexpanded predicates to be linked to entities which are expected to contain expansion rules for them. However, AR environments are strongly dynamic: entity knowledge and methods may change over time. This implies that an LE may have a reference to an FE which is expected to contain expansion rules for some predicates, but actually does not. Once the FE has recognized such an occurrence, it recursively behaves as an LE, and exhibits a semantic gap.
The opposite situation may occur as well: an LE may not contain a reference to an FE because this FE acquired knowledge and methods only after the LE FET setup. In this case knowledge and method should be looked up across the whole AR system.
Once an FE has been located, it can react three ways to an LE request:
An FE can update an LE FET by adding a couple, with its own name. This way, the FE becomes available for direct cooperation with the LE. This solution appears to be the most simple and immediate. However it may bring some bottleneck effect, because an FE could be invoked for cooperation by a large number of LE.
A n FE can update an LE by providing the requested knowledge and methods. This solution overcomes the bottleneck effect, but it can entail some overload due to LE update. Moreover, the same update may be needed on a large number of LE, thus resulting in system redundancy. Finally, some delay also occurs in cooperation due to update completion time.
An FE can generate a new entity which will be equipped with the requested knowledge and methods. After that the LE FET has to be updated by an entry with the new created entity name. This way a new entity becomes available for direct cooperation with the LE. This last approach increases system efficiency, but it requires a creation mechanism by which an entity (creator) can generate another entity. In order to overcome security drawbacks, initial knowledge and methods of the created entity are arranged as a subset of those of the creator entity. After that, created entities can grow independently from their creators.
We are developing a mobile agent middleware system by which an LE can delegate a resource look up agent to search a missing resource.
5. FIPA Agent Based Implementation
HAREM virtual entities are software agents capable of pursuing goals through their autonomous decisions, actions and social relationships. Similarly to a FIPA agent [FIPA 2002A], each role is a collection of complex behaviours performed by tasks. Actually, a HAREM entity may be mapped on a FIPA agent, by suitably enriching its structure. More precisely, some functional blocks should be added in the UML functional description of a FIPA agent, which account for vocation, survival and instinct.
Vocation aims at generating a context driven execution path among the role tasks, and manage their execution priority.
Survival aims at activating basic high-priority functionalities, which can be undertaken for security and preservation ends.
Instinct aims at activating simple reactive task, which are undertaken without any semantic elaboration.
An HAREM entity is arranged as a FIPA agent with five specialized roles: a receiver role, a vocation role, an ability role, a survival role, an instinct role (Fig. 4).
Fig. 4. HAREM static UML description
Further application specific operating roles are also included. Diagram in Fig. 5 describes the HAREM role interaction protocol according to the representation proposed by [ODELL 2001]. Each request addressed to the HAREM entity is first processed by receiver, which forwards it to survival and to vocation. Survival mainly deals with security issues: when some danger is detected, it alerts vocation. If no danger is detected, vocation selects the application specific operating roles to comply with the request. Communication between vocation and operating roles takes place according to the FIPA specifications on Agent Interaction Protocols [FIPA 2001] and to communicative act semantics in [FIPA 2002B].
Each operating role may send a request to ability for external resource lookup. This is performed by a request sent by the ability to other HAREM agents.
Finally, instinct is a default operating role. Vocation sends the request to instinct only if no other operating role can comply with it.
We are currently setting up a HAREM based application for service providing in cultural heritage environments. The project aims at turning a cultural heritage site into augmented reality for tourist service providing. For example, if a visitor is near some interesting object (e.g. a sculpture, a painting, some ruins), he should be automatically provided with information and facilities about it. In AR this is accomplished by interaction between the visitor and the surrounding environment. Similarly to other projects, our approach requires that visitors are provided with some mobile device (e.g. a PDA or a cellular phone with suitable connectivity features).
A Site Positioning System (SPS), which is currently based on Bluetooth and Infrared technologies, accomplishes position tracking of each visitor and stores 2 or 3 fixed coordinates of each interesting physical object in the environment, thus allowing proximity detection for context aware service providing.
Here we show a simple example from our application. As hinted in introduction, suppose that a statue is placed in some point of our site. Some spotlights (S1,S2 and S3) and a remote display are placed next to the statue.
Fig. 7. The Statue entity example
We want the AR environment to fulfill the following requirements:
1) when some visitor gets near the statue, a brief presentation of the statue is showed on the remote display.
2) when some visitor near the statue shows interest in some detail or explanation about the statue, required information is presented on the remote display.
3) when there is no visitor around, all the spotlights and the remote display are switched off.
4) when someone gets too near to the statue, an alarm system is activated.