Computing Point-of-View: Modeling and Simulating Judgments of Taste

Computing Point-of-View: Modeling and Simulating Judgments of Taste
Xinyu Hugo Liu
Sc.B., Massachusetts Institute of Technology (2001)

M.Eng., Massachusetts Institute of Technology (2002)

Submitted to the Program in Media Arts and Sciences,

School of Architecture and Planning,

in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Media Arts and Sciences
at the
May 2006
Program in Media Arts and Sciences

1 May 2006

Pattie Maes

Associate Professor

Program in Media Arts and Sciences

Thesis Supervisor

Andrew B. Lippman

Chair, Department Committee on Graduate Students

Program in Media Arts and Sciences

Computing Point-of-View: Modeling and Simulating Judgments of Taste
Xinyu Hugo Liu
Submitted to the Program in Media Arts and Sciences,

School of Architecture and Planning,

on 1 May 2006, in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy in Media Arts and Sciences
Point-of-view affords individuals the ability to judge and react broadly to people, things, and everyday happenstance; yet it has an ineffable quality that is hard to articulate in words. Drawing from semiotic theories of taste and communication, this thesis presents a computational theory for representing, acquiring, and tinkering with point-of-view.
I define viewpoint as an individual’s psychological locations within latent semantic spaces that represent the realms of cultural taste, aesthetic perception, possible attitudes, sense-of-humor, and taste for food. The topologies of these spaces are mined from online cultural corpora, and the individual's locations are inferred through psychoanalytic readings of her self-expressive texts. Once acquired, viewpoint models are brought to life through perspectival artifacts, which allow the exploration of someone else’s perspective through interactivity and play.
The thesis illustrates the theory by presenting viewpoint acquisition systems built for five realms. The technique of psychoanalytic reading is described, along with its core enabling technologies, which are commonsense reasoning and textual affect sensing. Finally, six perspectival artifacts were implemented to illuminate a range of promising applications for viewpoint modeling—tools for learning, self-reflection, matchmaking, and deep recommendation.

Thesis Supervisor: Pattie Maes

Title: Associate Professor, Program in Media Arts and Sciences

Computing Point-of-View: Modeling and Simulating Judgments of Taste
Xinyu Hugo Liu

Professor Pattie Maes

Associate Professor of Media Arts and Sciences

Massachusetts Institute of Technology

Professor William J. Mitchell

Head, Program in Media Arts and Sciences

Alexander W. Dreyfoos, Jr. (1954) Professor

Professor of Architecture and Media Arts and Sciences

Massachusetts Institute of Technology

Professor Warren Sack

Assistant Professor of Film & Digital Media

University of California, Santa Cruz




Abstract 03

Aperitif 10
1 Introduction 11

1.1 Roadmap and contributions 13

2 A computational theory of point-of-view 15

2.1 The ‘space + location’ framework pp

2.2 Representing viewpoint pp

2.3 The judgmental apparatus pp

2.4 Designing perspectival artifacts pp
3 Psychoanalytic reading pp

3.1 The genre of self-expressive texts pp

3.2 Structurations and schemas pp

3.3 Mechanics pp

3.4 Culture mining pp

3.5 Technology: commonsense reasoning pp

3.6 Technology: textual affect sensing pp
4 Viewpoint acquisition systems pp

4.1 Cultural taste viewpoint: ‘taste fabric’ pp

4.2 Gustation viewpoint: ‘synesthetic cookbook’ pp

4.3 Perception viewpoint: ‘escada’ pp

4.4 Attitude viewpoint: ‘what would they think?’ pp

4.5 Humor viewpoint: ‘catharses’ pp

5 Perspectival artifacts pp

5.1 Art that’s always tasteful pp

5.2 Serendipitous matchmaking pp

5.3 An identity mirror pp

5.4 Anytime mentors pp

5.5 Foraging for food with the family pp

5.6 A jocular companion pp
6 Philosophical underpinnings pp

6.1 Postmodern aesthetics pp

6.2 Field, habitus, doxa pp

6.3 Reading for the subject in the text pp

6.4 Perspective as affordance structure pp

6.5 Future work pp

7 Conclusion pp

Wither the future battles of humankind? I believe they will increasingly be fought in the aesthetic plane. Media systems and economies begin to unravel the use function of aesthetics, leading to more systematic productions of poetics. The willful construction and politicization of perspective, authenticity, and image up the ante in the worlds of ideology and marketing. Today’s artificial intelligence personalizes search and recommends books, but tomorrow’s will likely design our life-styles and proustian recommenders will select wines and spirits to release particular memories and desires submerged within each of us.
While the explicit topic of this dissertation is point-of-view, its underlying thematic is certainly aesthetics. A point-of-view, after all, may be recognized as a coherent and comprehensive system of aesthetics, efflorescing from the limitless ecology of aesthetics that is life. What renders point-of-view such a challenging study is precisely its complex etiology—just as each snowflake is constituted idiosyncratically by an unknown mixture of passing clouds, too are our perspectives shaped by psychological predispositions, had experiences, and culture embeddedness.
Point-of-view and its aesthetics have long been studied poetically and rhetorically. Here, I embark upon yet another such philosophical investigation of the topic, but with a twist. My theories are illuminated by semiotics and cognitive science, and supported by a cadre of implemented computational systems, built over the past four years, which attempt to automatically model and simulate particular persons’ judgments of taste in various aesthetical realms.
Because this work is seated at the confluence of various disciplines such as psychology, sociology, computation, design, philosophy, I have tried my best to refrain from overly technical writing, in hopes that this monograph may be enjoyed by the widest intellectual audience possible.

1 Introduction
Our capacity for aesthetics and affectedness is one of the most celebrated bastions of humanity. Underlying our explicit knowledge and rationality is a faculty for judgment—the impulsion to prefer, to view the world through our individual lenses of taste. An interesting intellectual question is: can a computer model a person’s tastes, attitudes, and aesthetics richly enough to predict their judgments? This thesis explores one answer to the question.
Our investigation flies under the banner of point-of-view for two reasons. Firstly, the term reflects an understanding that individual tastes are seated in, and articulate against a social and cultural fabric. Secondly, ‘point-of-view’ is developed to mean not isolated taste judgments, but rather, a coherent and systematic apparatus that engenders such judgments.
To compute point-of-view, a computer program reads the self-expressive texts of a particular person, and also the cultural texts that circumscribe her. Out of this, models of viewpoint were created for various semantic realms of concern—such as perception, cultural taste, attitudes, humor, and gustation. As encapsulations of the judgmental faculty, viewpoint models are dispatched by various ‘perspectival artifacts’ to simulate a person’s aesthetic judgments of arbitrary textual and semic input. The prospect that individuals’ taste perspectives could be reified, shared, and simulated suggests new approaches for learning, self-reflection, matchmaking, and deep recommendation


The computation and technique invoked in this inquiry builds upon a genealogy of viewpoint computation, based largely in a user modeling literature that has been exploring predictive models of computer system users for over two decades. Two primary lines have emerged in that literature—stereotype-based models and behavior-based models. Stereotype-based models—such as Elaine Rich’s (1979) book recommender system—represent users by their demographic categorizations, and acquire user profiles by asking a set of questions. Rich’s demographic stereotypes were potentially profound models, their premise based in archetypal psychology, but their crafting required superb intuition and much manual effort. And even when stereotypes could be properly crafted, the coarseness of the stereotypes still notably under-fit the individuality of people, limiting the depth of the system’s predictions.

A backlash against the hand-craft and heuristicality of symbolic AI systems swept behavior-based modeling into favor in the 1990s. Data-driven rather than heuristic, behavior-based modeling performs mathematical inference over a history of a user’s actions to predict future actions which a user should prefer. Examples of the genre include social information filtering recommenders (Shardanand & Maes 1995), Bayesian goal inference systems (Horvitz et al. 1998), and community-driven agent systems (Orwant 1995). Behavioral modelers have the advantage of automatic acquisition. However, because they are usually trained on a corpus of application-specific data, their models tend to portray persons narrowly as users of applications with specialized contexts, such as tutoring systems or e-commerce websites. There is a danger that application-specific models are over-fitted and not representative of the viewpoint of the same person under other contexts.
Orwant’s (1995) Doppelganger user modeling shell made a promising effort to reconcile stereotypes with behavioral modeling. In Doppelganger, the cross-application preferences of users of a computer ‘shell’ were modeled. That modeler used an HMM to disambiguate users into one of eight user ‘states’—hacking, idle, frustrated, writing, learning, playing, concentrating, image processing, and connecting. The robustness of users’ manually specified profiles were supported by falling back to user memberships within various ‘communities’, which can be understood as dynamic, community-built stereotypes. Doppelganger untied itself from particular applications, but it nonetheless considered persons as users of a computer shell. While user modelers routinely capture user’s ratings of items within application contexts, a general model and simulation of a person’s tastes, aesthetics, and opinions that cuts across application domains has not yet been achieved.


This thesis extends the user modeling literature by exploring the computational modeling of a person’s taste, aesthetics, and opinions in richer, more sophisticated ways. The introduction of techniques such as ‘culture mining’ and ‘psychoanalytic reading’ unlock previously opaque personal and cultural texts as sources of behavioral data. This new data in turn affords the capture of persons not as application users, but as everyday cultural participants.

It should be noted that the treatment of point-of-view in this thesis differs from Warren Sack’s (1994; 2001) understanding of ‘point-of-view’. Whereas Sack’s robotic readers mined ideological ‘spin’ structures from news stories, this thesis examines the psychological point-of-view of individual subjects. Ideological point-of-view is a set of politicized and institutional conventions, what Lakoff (Lakoff & Johnson 1980) calls metaphorical framings, e.g. the ‘Islamic martyrs’ versus the ‘Islamic terrorists’; psychological point-of-view is concerned with modeling the largely de-politicized taste patterns of one individual—how a person sees the world idiosyncratically by possessing various unconscious, culturally-conditioned lenses that lend affective tint to all judgments and reactions.

1.1 Roadmap and contributions

The thesis is fueled by a series of experimental systems that have been built to capture psychological viewpoint under five semantic realms—perception, cultural taste, attitudes, humor, and gustation. Accompanying ‘perspectival artifacts’ were also implemented to explore some of the affordances of viewpoint simulation to various tasks. These experiments, along with a theory and methodology for computing viewpoint, aim to support three technical contributions that advance the state-of-the-art in user/person modeling:

  • Viewpoint can be modeled as an individual’s psychological locations within latent semantic spaces such as the fields of perception, cultural taste, attitudes, humor, and gustation.

  • The topology of viewpoint spaces can be acquired by ‘culture mining’ various textual cultural corpora, while an individual’s location can be inferred by a ‘psychoanalytic reading’ of his/her self-expressive texts.

  • Perspectival artifacts that simulate a person’s taste judgments can be novel tools for learning, self-reflection, matchmaking, and deep recommendation.

The rest of the thesis is structured as follows.

Chapter 2 introduces a computational theory of point-of-view, which is at the heart of this work, and which presents a unified understanding of the implemented viewpoint systems. The theoretical roots of the framework—called ‘space+location’—are developed in relation to Bourdieu (1984; 1993) and others. Computational problematics such as knowledge representation, structure and consistency of viewpoint models, and techniques for simulating aesthetic judgments are discussed using examples from implemented systems. Finally, design principles for perspectival artifacts are developed in relation to the literature of interaction design.
Chapter 3 details a methodology for modeling and simulating viewpoint. Two techniques—‘culture mining’ and ‘psychoanalytic reading’—are grounded dually in semiotic theories of culture and reading, and in computational techniques such as story understanding and statistical language modeling. Two technologies—‘commonsense reasoning’ and ‘textual affect sensing’—are discussed as enablers of viewpoint computation from personal and cultural texts.
Chapter 4 supports the new theory and methodology for viewpoint computation by presenting five built experimental systems for acquiring viewpoint under the realms of perception, cultural taste, attitudes, humor, and gustation. The model validities of the three primary systems are evaluated.
Chapter 5 applies the viewpoint models resulting from the implemented systems to the design of six perspectival artifacts—1) a robotic artist which create art suited to a person’s perceptual viewpoint; 2) a kiosk that facilitates serendipitous matchmaking based on shared tastes; 3) a mirror to support self-reflection; 4) virtual mentors which give just-in-time feedback; 5) a synesthetic cookbook which simulates the tastebuds of family mentors; and 6) a jocular companion to parlay everyday woes into opportunities for humor. The three primary artifacts are evaluated for their affordances to the betterment of learning, self-reflection, deep recommendation.
Chapter 6 discusses the philosophical underpinnings of the computational theory and methodology presented in this thesis, and offer reflections and prospects for future work.

  1. A computational theory
    of point-of-view

Is there a single framework that can account for point-of-view computation across the various semantic realms of perception, cultural taste, attitudes, humor, and gustation? Where can viewpoint be located in textual corpora? How can a static viewpoint model be animated to simulate judgments of taste? From the premise that persons’ viewpoints can be captured and reproduced, what sorts of interfaces are appropriate for communicating viewpoint, and what promising applications entail? In order to compute point-of-view, it is necessary to develop a thorough understanding of these problematics.

This chapter theorizes the computation of point-of-view and the simulation of judgments of taste. It is structured as follows. Section 2.1 introduces ‘space+location’—a metaphor and global framework for conceptualizing point-of-view. Section 2.2 explores themes and variations in representing viewpoint spaces. Moving from the exterior notion of ‘space+location’ to the interior apparatus of perspective, Section 2.3 outlines how viewpoint may collect in individuals into an organized and consistent system. Section 2.4 explains how such as system can simulate aesthetic judgments, while Section 2.5 gives desiderata for the design of perspective-based computational artifacts. To note, the acquisition of viewpoint from textual sources is touched upon in Section 2.1 but will be explored in-depth in Chapter 3.

2.1 The ‘space+location’ framework

A layperson’s dissection of the term ‘point-of-view’—Mary and Jack have an argument over the significance of a ‘sunset’ and find that they disagree; Mary tells Jack that “a sunset is all about beauty, warmth, and romance,” while Jack retorts, “but from my point-of-view, I see things differently.” Here point-of-view evokes an image of the two debaters standing at far apart locations in a semantic space, as visualized in Figure 2-1. In the middle of the space sits a blob representing the ‘true’ meaning of a ‘sunset’. Jack’s claim “from my point-of-view, I see things differently” reifies as one debater reporting that he can see a different side of the blob than can the other debater, while allowing that he himself cannot grasp the whole meaning. So, having point-of-view relieves the anxiety of having true thoughts—instead, it privileges coherency and integrity over truth itself, for standing from the same vantage point, a debater will tend to report all sightings of meaning blobs through the same idiosyncratic lens called perspective, always seeing a certain side to things. To summarize, a viewpoint can be understood as a location in some semantic space; being at a particular location affords a particular perspective, which tints one’s judgments.


Figure 2-1. A spatial understanding of the ‘point-of-view’ metaphor—exterior and interior views.

Bourdieu’s field-habitus-doxa model of taste provides a theoretical basis for representing and computing point-of-view. The layman’s understanding of ‘point-of-view’ as a spatial metaphor is not only simple, but also powerful enough to entail a computational framework. In Distinction (1984)—a monograph on the social and cultural basis of taste—cultural sociologist Pierre Bourdieu theorized an elaboration of the viewpoint metaphor, and appropriated it to explain differences in taste between social classes in French society. Although the self-proclaimed significance of Distinction is to critique the role of capitals and taste in reproducing France’s social hierarchies, the work’s semiotically suggestive aspects are most interesting and relevant to our present investigation.
Among the work’s key words are three most relevant ones— habitus, field, and doxa. Bourdieu implicates an individual’s faculty of taste judgment as being structured by set of personal dispositions forming a habitus (from habit). A habitus is constituted as instinctive patterns of consumption over social and cultural fields of goods. The acknowledged intersection of the personal habitus and cultural field is in turn the doxa—doxa is the site of an individual’s self-assessed identity and location within the social and cultural field. Additionally, there may be other unconscious ways in which habitus aligns with field, such as via the class habitus—the predispositions inherited from one’s social class, which forms a backdrop to one’s own idiosyncratic habitus.
Habitus, field, and doxa, I suggest, quite resembles the basic concepts of perspective, space, and location in the layman’s metaphor, respectively. Space/field defines the limits of what is possible—be it a space of goods which can be consumed, a space of possible perceptions, or a space of all possible attitudes toward political topics. Fields considered in Distinction include the artistic field, the field of political opinions, and the field of life-styles. Location/doxa defines where an individual’s psychology fits into the field. Location does not be a point in the field, but rather, it is more likely a region or a pattern of affection over the field. Perspective/habitus is an individual’s apparatus of dispositions—be it her system of opinions, system of perception, or system of cultural taste. Perspective/habitus is the structure that can be applied most directly to predict the individual’s taste judgments.
To splice hairs for a moment, ‘doxa’ is the subset of one’s ‘location’ that is self-acknowledged; however, the remainder of one’s ‘location’ may be unconscious. For example, Jack may acknowledge his location in cultural space to include his adoration for ‘American Football’ and the ‘National Rifle Association’, while at the same time he could be unconscious of or disavow his location in ‘neo-conservatism’ or ‘macho identity’—even though his taste judgments correspond closely to other persons with those locations. Such unconscious locations could correspond with Bourdieu’s class habitus, though it would be more appropriate to sanitize the term ‘class habitus’ to ‘group habitus’ in this case. In this thesis, the term ‘location’ is largely used to invoke the acknowledged doxa plus unacknowledged group habitus, in toto. Where appropriate, doxa alone is referred to as ‘acknowledged location’.
Taste judgments can be predicted by measuring the subject-object or subject-subject distance. One of the most practical consequences of Bourdieu’s framework for taste computation is the finding that taste judgments and social distance can be measured as Cartesian distance between locations in the abstract, semantic space of the field.
The analyses presented in Distinction were based on statistical data (Bourdieu 1984, 525-545) that Bourdieu had compiled from a lifestyle survey of 1200 or so French residents, conducted in the 1960s. Plotting that data as a cloud of points in n-dimension feature-space, Bourdieu engaged in first-order statistics, identifying the centers of mass of point clouds, called the axes of inertia. Normalizing the clouds of points along these axes, Bourdieu produced two-dimensional maps of taste-space—such as the variants of petit-bourgeois taste (ibid., 340), the space of life-styles (ibid., 129), the space of food relative to economic and cultural capitals (ibid., 186), and the political space (ibid., 452). In Bourdieu’s diagrams, cultural and economic capital constituted primary organizing dimensions of taste. A person could be located in the various taste spaces by the amount of capitals he possessed, and based on this location, several predictions could be made relevant to our task by reading distances in the chart—1) what goods is he likely to consume, i.e. the goods at his location; 2) what goods is he likely to fancy or disdain, i.e. goods located just upstream or downstream of his location in economic capital; and 3) which other folks is he likely to share taste with, i.e. other folks overlapping his position in cultural and economic capitals.
Building upon Bourdieu’s spatial technique for measuring taste distance, and following some studies in empirical psychology correlating perspectival judgment with subject-object psychological distance (Montgomery, 1994), this investigation appropriates Cartesian distance between subject and object in various semantic spaces as a method for predicting taste judgments. The difficulty of taste viewpoint, however, rests largely on acquiring the various geometries and constitutions of such semantic spaces, and on assessing a person’s location within such spaces by reading his self-expressive texts.

The topology of a viewpoint space and an individual’s location within it can be acquired automatically by reading texts of the culture and person. While human culture and individual subjectivity are vastly complex and sophisticated systems, the field of semiotics has sought to represent these systems to a useful approximation, by seeking out the emergent logic of their symbols and codes. In semiotics, a culture can be conceptualized as a system of signs and significations (i.e., particular ways of mapping signifiers to underlying meanings). ‘Culture’ here need not mean an ethnos, for example, fashion can be regarded as a cultural system of clothing codes. Overlaying the field of signs are systems of privilege, which group signs into pairs and systematically prefer one sign (the master) to its partner sign (the slave). For example, in the culture of capitalism, the sign ‘rich’ dominates its partner sign ‘poor’, while the topic of ‘money’ or ‘capital’ dominates all other topics. In the semiotic framework, individuals are no longer centers of meaning, but are rather defined by how they occupy and appropriate cultural signs and codes—that is to say, individuals become locations in cultural spaces.

Because semiotics applies the language metaphor to explain the prevailing symbolism in cultures, the connection between narrative text and culture-as-text is foregrounded. Embedded in both sorts of texts are a system of signs and privileges. According to semiotician Kaja Silverman (1983), the development of semiotic reading strategies meant to uncover hidden codes in text can be regarded as a psychoanalytic turn in semiotics. By computationalizing psychoanalytic-semiotic reading strategies such as Silverman’s treatment of ‘suture’, and by applying these strategies to cultural and personal texts, it may be possible for a computer to automatically acquire viewpoint. Based on Silverman (1983) and others, reading techniques for acquiring cultural space and individual location are developed more in-depth in Section 3.1.
In this section, a ‘space+location’ framework was developed in relation to Bourdieu’s field-doxa-habitus model of taste. The next section further concretizes the ‘space+location’ framework proposed here by using the metaphor to develop architectures for each of the five implemented viewpoint realms.

2.2 Representing viewpoint

While cultural, economic, and educational capitals were of particular interest to Bourdieu, and were apropos french society circa the 1960s, they are by no means the only possible or valuable semantic structurations of viewpoint spaces. This section explains how the ‘space+location’ framework applies to each of the five implemented viewpoint realms—perception, cultural taste, attitudes, humor, and gustation. It must be noted that these realms are far from canonical, and they are quite overlapping in scope; rather each realm adds its own granularity of analysis to the overall judgmental faculty. Because these realms naturally tend toward their own topological organizations, three different-but-closely-related knowledge representations are developed—the semantic fabric, the semantic sheet, and the dimensional space. The rest of this section begins by overviewing each of the five viewpoint realms, which are to be presented more fully in Chapter 4. The section concludes by re-viewing the three introduced knowledge representations as a unity.

