There has been a steadily increasing interest in AT during the 1990s in both the HCI and CSCW communities (for example Engeström 1987; Bødker 1991; Kuutti 1991; Raeithel and Velichkovsky 1995; Nardi 1996) with a much narrower dissemination of DCog in the same period for example (Rogers and Ellis 1994; Hutchins 1995b; Ackerman and Halverson 1998; Hollan, Hutchins and Kirsh. in press)2. In many ways these theories are closely tied because they share a common intellectual heritage – the emphasis on the cognitive3. They are also in contrast, since Western-European and Russian pursuits of cognitive science diverged in the beginning of the 20th century.4 With a common heritage we might ask whether and how they diverge along the attributes of descriptive, rhetorical, inferential, and application power.
Both diverge from other cognitive theories by incorporating the social and cultural context of cognition. In practice, they do this in different ways. Each theory’s approach has much to do with its historical development. As a cognitive scientist, I’m interested in the divergence of their approaches. For me, the many phenomena of human society and activity are the result of human cognition. Much of their power arises from how cognition instantiates itself in the material world. As a practitioner of DCog analyses, and not unlike AT practitioners, I see the world of artifacts, personal history, culture, social, and organizational structure through a filter that labels them as the residua of collaborative cognition, analyzed along numerous time scales.
As a CSCW researcher, however, I am more concerned with how I can use a theory to understand a specific domain, reach insights about collaborative work in general, or design for a particular problem. Each of these puts different demands on the theory—the first on descriptive power, the second on rhetorical and inferential power, and the third on the practical application of the inferences.
In many ways I see the differences between AT and DCog as being superficial, at least as they apply to CSCW. Before the arguments begin, let me clarify. A large part of the power and usefulness of both theories, as with ethnomethodology, is their commitment to ethnographically collected data. That is, practitioners go to where the action is, observe how things really work, and are confronted with how (well or poorly) reality maps onto theoretical constructs5. This integration of ethnographic practice with theoretical constructs makes learning and using both theories more difficult. Of course, this begs the questions of how much of each theory’s success is due to the ethnography as opposed to theoretical traction. While I believe theories do provide additional leverage, both to the ethnographic practice and to the analysis, I set aside this issue here because both theories share this criticism.
As I read the papers in this issue I began to see several reasons why the AT perspective has become appealing in CSCW. As can be seen in these papers, it is applicable to a range of domains and levels of analysis, and it has descriptive power. Despite early calls that it was too difficult to learn (as reported in Nardi 1996b) the range of practitioners here – academics, members of large and small companies, as well as researchers – attest to its growing converts.
When I compare AT with DCog several things stand out:
AT has named its theoretical constructs well. Even though some names may conflict with common use of the terms, naming is very powerful – both for communicative as well as descriptive reasons.
In contrast, few theoretical constructs are explicitly named in DCog. Those that are discussed, either in Cognition in the Wild (Hutchins 1995b) or elsewhere, are not presented in a way that gives them same the rhetorical force of naming as seen in AT. This is important because names are often what you manipulate in a theory. Being able to manipulate data along with the names in AT provides an additional rhetorical advantage.
In AT, the perspective of the individual is at the center of everything. AT focuses on the cognitive process of an individual situated in a social, cultural, historical, and artifactual world.
In contrast, DCog focuses on thesocio-technical system, which usually (but not necessarily) includes individuals. DCog uses the same theoretical language for both people and artifacts. This common language has led others to critique the theory for assuming people are equated with artifacts in some way that denies their humanity. This is, in fact, not the case.
Dealing with process is built into the structure of how AT is presented. Activity system diagrams (e.g. Barthelmess and Anderson, p 4; Clases and Wehner p.9; Collins et al., p4, Korpela et al., p 2 & 3; Spasser, p19;) keep process in the foreground for both reader and analyst. This is somewhat ironic, since a static diagram represents essentially dynamic relationships between the key components. Nonetheless, their representation provides both descriptive and rhetorical power.
In DCog, process(ing) is so central to the analysis that it may be less obvious to the uninitiated. Unlike AT there is no iconic structure applied to each situation. Instead, it is built into the process of analysis, and may or may not be represented in the products of that analysis.
To clarify these statements I need to introduce distributed cognition theory and compare it with aspects of activity theory. For illustration I will draw on work investigating call centers and organizational memory (Ackerman and Halverson 1998; Ackerman and Halverson 1999; Ackerman and Halverson 2000) and compare primarily with two papers from this issue: Clases and Wehner, and Collins, Shukla, and Redmiles. By illustrating the similarities and differences between AT and DCog within a comparable domain I explore what we need in CSCW from a theory.
In the last century American cognitive science focused on the cognition of the individual extracted from their social and cultural context (Hutchins, 1995b quoting Gardner 1984). This may seem odd juxtaposed with the approach taken in AT6. However, it is only in the last decade7 that cognition has been more generally acknowledged as distributed rather than by definition the property of an individual mind (Salomon 1993; Hutchins 1995b; Clark 1997). (Researchers differ on how cognition is distributed, but Spasser’s (this issue) casual reference to a distributed cognitive system without any specific citation or definition speaks volumes for the current acceptance of this notion.) With this has come the recognition that collections of individuals have cognitive properties that are different from sole individuals, often emergent from their collective behavior.
Several researchers have used the term distributed to mark the difference in perspective from more traditional approaches to cognitive science (Norman 1991; Zhang and Norman 1991; Salomon 1993; Hutchins 1995b) including people writing in the AT tradition (Cole and Engeström 1993). I focus on distributed cognition theory developed by Hutchins beginning in the mid-80s, published in his book (1995) and a number of articles (Hutchins 1988; Hutchins 1990; Hutchins and Hazlehurst 1990; Hutchins 1991; Hutchins and Klausen 1992; Flor and Hutchins 1992; Hutchins and Palen 1993; Hutchins 1995a) and which continues to be developed in his lab (Halverson 1995; Holder 1999; Hollan, Hutchins et al. in press) and elsewhere (Rogers and Ellis 1994; Perry 1997).
Distributed cognition is not some “new” kind of cognition, rather a recognition of the perspective that all of cognition can be fruitfully viewed as occurring in a distributed manner. As a cognitive theory, DCog is focused on the organization and operation of cognitive systems; that is, with the mechanisms that make up cognitive processes, which result in cognitive accomplishments. It recognizes that “a process is not cognitive simply because it happens in a brain, nor is a process non-cognitive simply because it happens in the interactions among many brains” (Hollan et al. in press, p2). This opens up our notions of cognitive processes to a much wider variety of mechanisms than the classic symbol manipulation of the physical symbol system hypothesis (PSS) (Newell and Simon 1972; Simon 1990). Hutchins argues that PSS works better as “… a model of the operation of a sociocultural system from which the human actor has been removed.” (p. 363, emphasis his), rather than a model of an individual’s internal cognitive processes. Distributed cognition theory capitalizes on this view by refocusing attention on the social-cultural system—the cognitive system which functions by bringing representational media into coordination with one another.
“I do believe that the computation observed in the activity of the larger system can be described in the way cognition has been traditionally described – that is, as computation realized through the creation, transformation, and propagation of representational states.” (Hutchins 1995b, p49, emphasis mine)
Hutchins’ approach carries with it a commitment to ethnographic data collection and method. The analysis in Cognition in the Wild, following Marr (1983), proceeds through multiple levels of analysis which can be described as: 1) a functional definition of the cognitive system; 2) enumeration of representations and processes within that system; and 3) the physical instantiation of representations and the algorithm(s) that control the processes.
The utility of DCog for CSCW, like AT, is its theoretical commitment to examine this broader socio-cultural-technical system, which is necessary for the collaboration between individuals mediated by artifacts. Furthermore, its focus on representational states and the system level cognitive work they do is extremely useful for design. But how do we define that system?
3.3 What’s in a name? The definition of a unit of analysis
A key tenet of distributed cognition is its commitment to a unit of analysis defined in relation to the complex phenomena being observed. As Hutchins (1995b) shows, the information processing in a navigation team varies with the context and circumstances. Solo watch standing involves the interaction of one individual with various artifacts, structured via well-established procedures and routines. In contrast, entering a harbor requires the effort of several people, again in coordination with specialized tools and with each other, but at a much more rapid pace. While the overall behavior exhibited by the system is the same—navigation—the means change. Thus, we see that within the system there are mechanisms that dynamically reconfigure to bring subsystems into coordination in order to accomplish certain functions.
More specifically, for AT the primary unit of analysis is the activity (cf. all of the papers in this issue, as well as Kuutti 1996). Thus you have Collins et al. discussing the Customer Support Activity System and the Knowledge Authoring Activity System. This naming makes the object of inquiry very clear-cut rhetorically. That is, the primary theoretical concept of activity theory is activity and which is comprised of action. AT also defines activity as the central unit of analysis. This overlaps with the common sense use of activity as something that one does. For example, look at how Collins et al. (this issue §2.1) outline their object of inquiry:
In Hewlett-Packard’s culture, this documentation activity is called “knowledge authoring”. The term Knowledge Authoring Activity System will be used to refer to this activity. Closely linked but not discussed in detail in this analysis, was an explicit Knowledge Maintenance Activity System. Finally, both knowledge authoring and maintenance are part of a larger activity of supporting Hewlett-Packard customers, the Customer Support Activity System.”
It is immediately evident what aspects they are exploring. Equally, we know which activities they have set aside from consideration. Using Engeström’s Activity System Model (Engeström et al. 1999) as a conceptual framework they describe the setting of a help desk, situated within the broader organizational context. They enumerate not only the key parts of the activity system—definition of the activity, as well as subject, object, and outcome—but also what governs the relationships between them. By naming these—rules, division of labor, and mediating artifacts—it becomes easier to communicate about the setting and analysis with those who understand the terms.
This highlights what I meant when I said that the theory is good at naming things. AT has rhetorical power, not because it names things-in-the-world, but because it names conceptual and analytical constructions with which any analyst looking at a collaborative system has had to struggle. Naming a category “mediating artifacts” focuses the analyst’s attention around those objects used by the subjects of the activity system. Naming helps communicate to others – particularly when they do not understand the particular domain. (To take a trivial example, if a ruler is a mediating artifact then the analyst is signaling me that the ruler is doing some work that is important for me, the reader, to look at more closely.) Conversely, if a reader understands the domain, they can bridge to the theoretical concepts because they are named and organized and mapped onto the domain. This is not unique to AT, but nonetheless it is powerful.
In apparent contrast, DCog does not have a special name for the unit of analysis. It frames the problem in terms of examining the cognition of a system in terms of its function. The functional requirements drive analytical focus, wherein functional operation is decomposed into smaller units of analysis that make sense with respect to the particular function or task within the system. Like the example from Collins et al. above, we would begin to define the functional system in a straightforward manner. System operation will re-focus us on an event driven segmentation of the tasks (and subtasks). Taking a perspective that does not privilege the individual (yet also does not exclude the individual as the scope of the unit of analysis) may mean that configurations exist of both multiple or solitary components, human agents as well as human produced artifacts, and social and cultural structures. It is the task requirements that dictate which configuration is the one that counts for understanding a particular task.
This may be more obvious if we look at a more concrete example. In the study of the operation of a hotline for personnel questions, we define the unit of analysis variably. Sometimes it is single customers’ call bounded on one side by the initial ringing, and on the other side by the ending disconnect of the telephone (Ackerman and Halverson 1998; Ackerman and Halverson 2000). Elsewhere (Ackerman and Halverson 1999) it is defined more conceptually, based on events that focused on one issue but whose resolution spans hours or days.
Regardless of the scope of the unit of analysis, the process of analysis is the same. In each case, within the unit of analysis, representational states and the processes that act on them are identified. However, the potential of the analysis is determined by the scope of the unit of analysis, and that scope varies. In the simplest case above, a call to verify employment, the unit of analysis comprised two individuals (the customer calling and the call taker) and several artifacts that both mediate the call and that contain the information in question.
The purpose of drawing this distinction around how the unit of analysis is defined is to highlight different strengths of the two theories. In AT the naming of the unit of analysis as activity is just one of many theoretic names at different levels of abstraction. The papers in this issue range from detailing phenomena across a broad range of these levels (Collins, et al.) to specifying them at only one level (e.g. Barthelmess and Anderson). I suspect that having the overhead of naming does make it difficult to learn and master the theory. While it requires additional precision on the part of the analyst – almost every paper here defines and clarifies terms and their use – it also provides precision in communication to other AT practitioners. (But compare with Collins et al.’s report of problems communicating to other researchers and managers at their field site because of the confusion between the theoretical object and the common sense objective. In addition, there is the careful work of Barthelmess and Anderson to detail the difference between the theoretical language of AT and the technical use of the same or similar terms within their domain.) In addition, the power of naming theoretical constructs and defining their relations allows an analyst to manipulate the theory at the same time she manipulates her data. In the terms I used before, this shows descriptive, rhetorical, and inferential power.
Clases & Wehner do an exquisite job connecting activity theoretic concepts to the issues they see of importance. They reason through the theoretic concepts until it seems that the conclusions come directly from the theory rather than from an analysis of a specific setting. For example, when talking about how artifacts are a symbolic externalization of a specific practice they draw out an essential knowledge management example.
"One of the core ideas of activity theory is that human activity is mediated by societal forms as well as operative means. Figure 2 is based on these schemes and visualizes CSCW systems as mediating the joint activity in or between different communities of practice. The figure shows that the joint activity evolving between different actors is mediated—on the level of societal forms—by informal rules, self-constraints and a certain division of labor that historically evolve in communities of practice. On the other hand, the interaction between actors in computer-supported work places is being structured—on the level of operative means—by the characteristics of the specific CSCW system in use. The CSCW system will provide actor A with means of production, i.e. features to generate certain objects, which will then be represented for Actor B by the use of the system providing means of orientation. The artifacts produced by means of CSCW systems may be looked upon as symbolic externalizations of a specific practice. Therefore, when using a CSCW system, Actor A has to transform his experiences made and knowledge gained into a certain document. For Actor B, this externalization of a specific practice in the first case appears as codified knowledge, i.e. information that might be useful in another context. Depending on the way in which the context of generation the information is presented, Actor B will be more or less able to put it into perspective. In other words: Knowledge may not be immediately be ‘transferred’ but is transformed by processes of codification and interpretation."
Of note, most of the terms italicized in the above excerpt are not just for rhetorical emphasis, but also indicate theoretical terms. While a flavor of the knowledge management domain comes through in this excerpt, overall the example reasons using the higher level of theoretical constructs. In contrast, in Ackerman and Halverson (1998,1999,2000) we talk about similar phenomena with reference to the domain of inquiry, that is the specifics of the hotline, rather than the theory. Within the domain there is the problem that knowledge must be de-contextualized from its specific situation before it is stored, but in order to be used, it must be re-contextualized to fit with the new situation. Using DCog (and some AT terminology) our analysis deconstructed actions of a particular actor at the very low level of representational states. With other input derived from field observations we then used those analytical insights and rebuilt a narrative of our understanding situated with respect to the domain. The insights we gained are with respect to the domain, and mostly fall outside the theory. One of our conclusions, the notion that we were seeing information acting as a boundary object, is not an insight into the theory of distributed cognition per se. It is also not an insight extracted by manipulation of DCog’s theoretical constructs. But neither was it obscured by the theory.
DCog names almost none of its theoretical constructs, except at the very basic level of representational states. An analyst manipulates data to draw conclusions about the world, but this does not equate to manipulating the theory itself. The chain of inferences that build back from the low level of analysis to higher theoretical constructs is almost completely hidden from others. In DCog, descriptions analogous to ‘division of labor’ or ‘mediating artifact’ are higher-level constructs that are not named within the theory. The communicative weight is carried by a description of the phenomena and the higher-level implications. This translates to less rhetorical power and makes discourse in the theoretical community more cumbersome. However, the focus at the level of processes, representational states, and their meaning (representations), exposes system workings at a level that has considerable descriptive power. This makes DCog particularly useful for those who are focused on design. For those who understand the domain, the detailed description at this level makes it possible to see the implications of changes.
Part of DCog’s power lies in its flexible unit of analysis. This provides a mechanism to reconfigure the analytical framework in a situation specific manner. In the case of the hotline group, one can imagine that treating it in terms of the activity of taking calls would be fine. At a high level this is true. But flexibility in drawing the boundary of the unit of analysis exposes how a simple call for employment verification is both like, and unlike, a more complicated call regarding insurance payments. In the simplest call, we can see that with increasing automation that the same call may use only one individual, or even none, while the work done in the more complex call is hard to envision without the intervention of a human. Because DCog deals with humans and artifacts as they contribute to the larger socio-technical system, both possibilities can be analyzed. In contrast, because AT centers the activity system around the subject (individual), analysis of an automated subtask is problematic.
3.4 Theoretical language and non-human agents
This example raises the issue of how DCog and AT handle people and artifacts. Fjeld et al. (referring to Nardi 1996a) state “… distributed cognition puts people and things at the same level; they are both ‘agents’ in a system”. They go on to say that this means DCog “ignores the faculties of human beings not found within computers, like motive, emotionality and consciousness. It also ignores for computers their non-human traits, name their ability to execute programs in a precise and predictable manner.”
While I agree that DCog does not focus on some of what goes on inside humans, I disagree that it ignores all that goes on inside both humans and artifacts, including computers. This misconception of the theory is based on how and why DCog ‘treats humans and artifacts the same’. Analysis enumerates the representational states, the media on which they are instantiated, and the observed processing of those states.
“The conduct of the activity proceeds by the operation of functional systems that bring representational media into coordination with one another. The representational media may be inside as well as outside the individuals involved. These functional systems propagate representational state across the media.” (Hutchins, 1995, p. 372-3)
The phrasing may be awkward, but it reflects DCog’s theoretical commitment to not privilege the individual. Thus humans are not the only agents that bring representational media into coordination. This is possible because the theoretical language of distributed cognition theory itself does not privilege the individual over other components of the system. One way to view this is indeed that human and non-human can be cognitive agents, and the focus is on the observable aspects of the cognitive processing. This does mean that emotion may be left out of the analysis, insofar as it occurs hidden from view inside an individual’s head. (However, insofar as it is manifested externally in the operations of the cognitive system it may be a valid part of the analysis.)
For our analyses (Ackerman and Halverson 1998; Ackerman and Halverson 1999; Ackerman and Halverson 2000), being able to span human and non-human cognitive agents, as well as organizational and cultural structures and norms allows us to cover the diverse manifestations of organizational memory. The common breakdown into representational states and processes provides a way to analyze how the observed details achieve the particular function that is the focus of a unit of analysis. This presents artifacts, human actors, and organizational and social structures on an equal theoretical footing. With a description constructed in these terms we can begin to understand how technologies and social structures currently fit a system’s operation. Once analyzed into its component representational states and processes, the analyst uses that information to reconstruct the functioning of the system. This allows an analysis with respect to the context of use within an organization. By extension one can speculate about how changes in technologies might affect future operations. What does this look like?
In (Ackerman and Halverson 2000) we analyze a very simple call—one about employment verification. As is common for many complicated analytical frameworks, in the paper we skip presenting the full details of the analysis process and instead present what is necessary to support the conclusion that the call taker uses not one memory, but many, and we support this with a diagram showing all the memories used (Figure 1). To highlight both the power of the analysis and how people seem to get “left out” I want to walk through a part of the analysis that we left out of that previous paper.
The setting is a hotline group (here abbreviated HLG) for personnel concerns at large company. HLG takes calls from both inside and outside the company. This particular call is an "employment verification”, where a caller (for example, a mortgage lender) contacts HLG to find out if a person is actually an employee. In order to answer this request the agent, Joan, must look up the person in a specific database, the EMPLOY system. Because of technical incompatibilities, the database must be accessed on a terminal separate from the one on her desk. This terminal (with EMPLOY) is shared by all of the agents, and it is located about three meters from Joan's desk. The agent, then, must disconnect her headset from the phone, walk to this central table and look up the person on the EMPLOY system. Furthermore, part of the HLG agent’s job is to maintain a record of call requests. To do this they use another computational system, the Call Tracking system (CAT), which is accessed from their desktop system.
The analysis began with the observations – primarily videotape, supported by additional direct observations, semi-structured interviews, social network analysis, and field notes. For the DCog analysis, the unit of analysis, as I discussed above, was clearly circumscribed by the extent of the phone call—because the temporal extent happened to coincide with the functional extent of the verification. Transcribing the call included actions as well as discourse. For privacy reasons we could only record half of the conversation, so we are limited in what we can directly observe. Table 1 shows the first three turns of this call in the transcript, interleaved with Joan’s actions.
Like AT there are many levels at which we can represent this. At the most basic level we detail:
Representational states and the media they are instantiated in (or on);
The character of the processing (such as creation, propagation, transformation) and a description of its mechanism
Agents which enable the processing, whether human or artifact
At this stage all agents involved in the processing are enumerated, and only later pruned. Table 2 shows one detailed representation of the first three boxes from figure 1, which coincides with most of turn 3 (shown in Table 1). Reading down under each spoken fragment we see that the representational state is propagated through a variety of media and agents. A trio of entries details the agent, the representational state and medium, and the kind of processing. So Joan moving a mouse is represented by signifying the agent as Joan. The representational state is her hand position on the medium of the mouse. This is creating a physical process. The representational media detailed in the table are coordinated with each other to move the representational states through the processing necessary to accomplish the cognitive functioning of the system.
< Table 2 about here>
Notice that people and artifacts are treated equally as agents in some cases because they do processing. So Joan’s use of the mouse to drive the cursor to close a call tracking record and open another is represented as the propagation of her physical action, transformed through the mouse and the CPU that results in the cursor movement that appears on the CRT (first column Table 2). There is internal processing in Joan, and similarly there is implied internal processing that is happening inside the mouse and the computer CPU. In this representation internal processing has been left out. We generally ignore internal processing for two reasons: it is not the focus of the functional system, nor is it observable. But we often know that it is there. We can infer hidden from observable processes. In the case of computers we often have other means to know the internal processing, such as manuals. (Unfortunately we do not have the definitive manual on human processing.)
Figure 2 collapses some of the detail from the third column in this system into a diagrammatic “short hand” that re-represents it. Each box shows the agent, the representational state, and the media it is instantiated on. Joan says, “I just need to get, to get a little more information”. In saying this she does some internal processing that creates a representational state of the words carried on a vocal medium (i.e. her voice). This representational state is propagated verbally to the telephone, which then does its own processing, propagating the same representational state to the listener. At this level we presume the same medium. The caller does auditory processing on the same representational state.
One thing this figure points out is the problem of representing representational states and processes sufficiently. Table 2 is more explicit about what the processes are, while Figure 2 provides a better sense of the movement of the representational state as it gets processed. Figure 3 however gives a better idea of where memory is and foreshadows the result presented in Figure 1. It also gives a better representation of how agents bring representational states into coordination with each other to accomplish processing. Figure 3 uses yet another representation where agents, representational states, processes, and memory are all present. Agents are circular. Triangles represent memory. (The grayed-out triangle ‘switch’ as part of the telephone is unused memory.) Arrows represent the character of the processing; in this case more of a memory aid for the analyst to reconstruct what happened.