Evaluating systems is a difficult task, and it becomes even more difficult when the system is adaptive. It is of crucial importance to be able to distinguish the adaptive features of the system from the general usability of the designed tool. This is probably why most studies of adaptive systems are comparisons of the system with and without adaptivity (Meyer, 1994; Boyle and Encarnacion, 1994; Brusilovsky and Pesin, 1995; Kaplan et al., 1993). The problem with those studies is obvious: the non-adaptive system may not have been designed ‘optimally’ for the task. At least this should be the case since adaptivity should preferably be an inherent and natural part of a system – when removed the system is not complete. Still, it is very hard to prove that it is actually the adaptivity that makes the system better unless that condition can be compared with one without adaptivity. The last study presented in this chapter is no exception to the ”comparative studies” set-up, in section .
An alternative view on how to study adaptive systems, is put forth by Oppermann, (1994), who prefers to see them as part of the design cycle. Since adaptivity is a complex machinery, there must be several rounds of studies which aid the designers in getting the adaptivity right. For example, if the adaptive hypermedia system is supposed to provide different kinds of information to users depending on their knowledge, goal or needs, it may be necessary to make several studies before the right relevance-criterion can be set up between the user’s goal and the preferred information content (or information presentation). Prior to the last, comparative, study we did a couple of what we call ”bootstrapping” studies with the goal of finding the relevant relations between the users’ tasks and the information the system should adaptively provide the users with. This way, we could bootstrap the relevance criterion that our adaptive mechanisms make use of.
The approach taken in the PUSH project can be described as a combination of a design/bootstrapping cycle and a comparative study set-up.
Before describing the last studies of the PUSH project, let us provide some background to evaluations of adaptive systems.
Evaluation of Adaptive Systems
There are few studies of adaptive systems in general, and even fewer of adaptive hypermedia systems. The few studies of adaptive hypermedia systems have shown that the systems are quite efficient in reaching their goals. In the second of two studies of HYPERFLEX, (Kaplan et al., 1993), it was shown that the adaptive system could sometime decrease the search time by 40%. In the study by Boyle and Encarnacion on MetaDoc, (1994), it was shown that after using the adaptive system users solved a set of reading comprehension tasks in significantly less time, and they also had significantly more correct answers.
Apart from deciding the form of the evaluation (e.g. comparative study or design cycle), another important issue is what to measure when evaluating the adaptivity. In the studies of adaptive hypermedia systems by (Boyle and Encarnacion, 1994; Brusilovsky and Pesin, 1995; Kaplan et al., 1993) the main evaluation criterion is task completion time. This should obviously be one important criterion by which some systems should be evaluated. In our case, though, the goal of the adaptive hypermedia system is to provide users with the correct, most relevant, information and make sure that they are not lost on their way to this information. The time spent in retrieving information is not as relevant as the quality of search and the result.
Boyle and Encarnacion also measured reading comprehension, while Kaplan and colleagues measured how many nodes the users visited: in their case the more nodes the users visited, the better. Finally, Brusilovsky and Pesin measured how many times their students revisited ”concepts” they were attempting to learn.(the fewer the better?) In our study, we were interested in whether the subjects actually found the requested information at all with and without the adaptivity switched on. We were also interested in the subjects’ own evaluation of how well the adaptive system worked compared to the non-adaptive one.
Finally, a last difficulty in making studies of adaptive systems, is in the procedure of the study. Most adaptive systems will be really useful when they are part of the users’ work for a longer period: only during that longer period can we see how the users’ needs and goals vary in a ”natural” way. Obviously, this may not be feasible in a research project which has to be finished in limited time. Instead, we have to make the subjects solve a pre-defined set of realistic tasks to which we know that the system will be able to adapt.
There are two relations in POP’s database that will determine how well the adaptivity will work:
• the relation between information seeking task and information entities (the relevance criteria)
• the relation between users’ plans and their information seeking task (the plan library)
In our first version of the prototype we set up these two relations from what we had learnt in our studies as a set of declarative rules. This is, of course, not enough since our knowledge of the relations was based on studies done with a completely different on-line manual system, the Framemaker manual. This is what is called the paradox-of-change problem (as discussed in section ). We now needed to empirically establish the validity of the relations based on users’ interactions with our prototype, POP.
A second problem then occurs22: how can we construct a plan library when the usage of those very plans will directly affect the dialogue structure? For example, if we set up a study where we ask the users to tell us which information entities they would open given a particular query and a particular task, we could log their actions and assume that this set of actions pointed at the users having a certain task. But, if we then implemented this as a plan in our plan library, this would change the interaction with the user so that certain information entities would be open from the start and others would be closed. So after a few actions the system would start adapting and thereby making certain actions in the plan impossible (it is, for example, not possible to open an information entity that is already opened) while making other, previously impossible, actions possible. The log with actions empirically collected cannot be used as a plan in the plan library as it stands.
The relation between task and generated answer was easier to establish empirically.
In addition to the problem with the bootstrapping of the plan library and the relevance criteria, we also had to consider the evaluation of the WWW interface. We had to make sure that the users understood how to interact with a non-adaptive version of the tool before we could test how well the adaptivity worked. Since our system would provide a richer environment than normally available in WWW, we wanted to see whether the users would understand the ”links” provided in our interface.
So, in order to bootstrap the adaptivity we had to consider several different kinds of studies or combination of studies. In the end we decided to do three different studies:
• An initial evaluation of the relevance of the information entities to the tasks as a paper-and-pencil exercise, see section .
• A study of the non-adaptive version of our system where we asked the users to tell us which information entities would be most relevant to a task through navigating to them and opening them. In this study we also evaluated the usability of our interface, see section .
• Finally, a study comparing the adaptive system to a non-adaptive variant of the same system in order to evaluate the usefulness of the adaptive behaviour, see section .
In short summary, the first study convinced us about the usefulness of adapting explanations to the information seeking tasks, and it provided some input into the relation between the explanations and their corresponding tasks. The second study convinced us about the usefulness of our WWW interface, and it provided more input to both the relation between task and explanation and between plan and task. The third, comparative, study showed that:
-
the adaptive system reduces the number of within-page user actions
-
the adaptive system influences subjects choice of information entities to be included in their solutions
-
the subjects preferred the adaptive system over the non-adaptive variant
-
there is a weak tendency that the adaptive system will, in the long run, reduce search time
Paper and Pencil Exercise
We have argued that is important that users can distinguish between the different explanations. Only then can users make the connection between task and explanation and learn how to best utilise the system. In order to find whether users could distinguish between the different explanations and tasks, we tested the explanations on seven users. We provided them with four different explanations and brief descriptions of the four information seeking tasks (project planning, reverse engineering, learning structure, performing an activity). We then asked them to pair the tasks and explanations and motivate why they could/could not be paired. The explanations were provided as an answer to the fairly general question Describe the process subD:iom.
Out of the seven subjects, five were experts on SDP and two were novices. The experts could be divided into two groups: those who had taken part in developing SDP (three subjects) and those who had gained their knowledge from applying SDP in projects (two subjects).
Five subjects did a ”correct” pairing of task and explanation, while two subjects mixed the ‘planning project’ and the ‘learning structure’ explanation. After adding an information entity with the goal of telling the project planner why a certain process should be employed in a project (‘Project planning information’), these two subjects found the explanations helpful and distinguishable from one another. Åsa undrar vilka fem dessa var
The subjects all reported that having different explanations on SDP like the ones we presented, would be very helpful to many groups of users.
Share with your friends: |