Emotional Annotation of Text
David Gallagher
Department of Computer Science
University of Wisconsin-Platteville
GallagherD@uwplatt.edu
Abstract
Emotions play an important role in human intelligence, rational decision making, social interaction, perception, memory, learning, creativity, and more [4]. Automatic detection of emotions in texts is progressively becoming an area of research demanding attention, since emotions convey a vast amount of information that is difficult to comprehend and often difficult to express in words, particularly in text. Effective analysis of text can lead to a vast array of applications, such as opinion mining, market analysis, affective computing, and natural language interfaces such as e-learning environments or educational/edutainment games. Emotions have been deeply studied in order to gain a better understanding of human interaction. While the subject of emotions has been classically researched by the fields of psychology and behavior sciences, with increasingly integrated communications, the subject is applicable to the field of computer science, particularly in field of human computer interaction.
Introduction
Emotions have been a classical area of study for various disciplines such as psychology and behavior sciences, as emotions play an intrinsic role in our human nature. In particular, emotions convey a vast amount of subtext or connotation that can greatly change the context of the communications that individuals share with one another. Researchers in the field of computer science have carried out various studies on the emotions represented by facial expression [6] and the recognition of these emotions, through a variety of sensors analyzing the various facial expressions shown in the various emotions [3].
The most natural way for a computer to recognize the emotion of the user is to detect his or her emotional state from the text that the user entered be it from a blog, online chat site, or in another form of text [2]. The automatic emotional annotation in texts is becoming increasing important from an applicative point of view, for the advancement of many related fields. Affective computing, or natural language interfaces such as e-learning environments or education/edutainment games would benefit significantly from automated emotional annotation of text. The study would also greatly increase the effectiveness of machine-driven tasks of opinion mining and market analysis.
For example, the following are specific areas that the application of an automated affective analysis could make intriguing and invaluable advancements:
-
Sentiment Analysis.
-
Computer Assisted Creativity.
-
Verbal Expressivity in Human Computer Interaction.
-
Artificial Intelligence.
Knowledge-based approaches and machine learning approaches were adopted for automatic analysis of emotions in text, aiming to detect the writer’s emotional state. Knowledge-based approaches consist of using linguistic models or prior knowledge to classify emotional text. Machine learning approaches use supervised learning algorithms to build models from annotated corpora or a large and structured set of texts. Research done in the field of sentiment analysis have also been applied various linguistic models and different learning algorithms. The machine learning technique tended to perform better than lexical-based techniques because they can adapt well to different domains [2].
This paper highlights affective analysis background and related fields and a simplified process to conduct computer assisted emotional annotation of text.
Framework and Background
Deriving the emotional content of a text through linguistic analysis is an extremely, even infamously difficult task. Many fields, such as psychology, sociology, and philosophy, have proposed approaches for the emotion detection. These fields have studied emotions with respect to facial expressions, action tendencies, physiological activity, and subjective experience [3].
A text-based emotion prediction system would benefit from identifying the emotional affinity of sentences. The emotion analysis on sentence level may also be important for more detailed emotion analysis systems [5].
Several researchers have attempted to solve this issue in distinct ways. Cecilia Ovesdotter Alm a professor at Rochester Institute of Technology explored the text-based emotion prediction problem. In order to classify the emotional affinity of sentences in the narrative domain of children’s fairy tales, they annotated a corpus of 22 Grimms’ tales on sentence level with eight emotion categories (angry, disgusted, fearful, happy, sad, positively surprised, and negatively surprised). Alena Neviarouskaya, a JSPS Postdoctoral Researcher in the Knowledge Data Engineering and Information Retrieval Laboratory, Department of Computer Science and Engineering, Toyohashi University of Technology, addressed the tasks of recognition and interpretation of affect communicated through text messaging [1]. Classifying the mood of a single text is a hard task; state-of-the-art methods in text classification achieve only modest performance in this domain. In this area, some of the hardest problems involve acquiring large collection of text tagged with detail linguistic expressions that indicate emotion.
Development
To lay the framework for the development of automatic emotional annotation one must decide what are the most basic of emotions. This is an important step and there are several emotional models that have been developed that may be used as a pivotal resource. These emotional models include:
-
Plutchik’s Model. Robert Plutchik, a psychology professor emeritus at the Albert Einstein College of Medicine and Adjunct professor at the University of South Flordia, proposes that there is a small number of basic emotions; anger, anticipation, disgust, joy, fear, sadness and surprise. All other emotions are derivative states; that is, they occur as combinations, mixtures, or compounds of the primary emotions. Plutchik states that all emotions vary in their degree of similarity to one another and that each emotion can exist in varying degrees of intensity or levels of arousal (See Figure 1).
Figure 1. Plutchik’s Wheel
-
Ekman. Paul Ekman, a widely renowned American psychologist, has focused on a set of six basic emotions that have associated facial expressions: anger, disgust, fear, joy, sadness and surprise. Those emotions are distinctive, among other properties, by the facial expression characteristic to each one.
-
OCC Model. The OCC Model has become the authoritative model for emotional synthesis. It presents its 22 emotional categories in pair of an emotion and its antithesis: pride-shame, love-hate, hope-fear and so on.
-
Parrot. Parrot categorizes the emotions in a short tree structure. This tree has three levels: primary emotions, secondary emotions and tertiary emotions. Parrot presents love, joy, surprise, anger, sadness and fear, as the primary emotions.
Even with a model that can accurately annotate text with values of the chosen basic emotions, several emotions may have similar or ambiguous meaning such as happy and contented. A technique to distinguish these words is through the use of emotional dimensions. There are three categories of emotional dimension: evaluation, activation and power. Evaluation represents how positive or negative an emotion is. Activation represents an active or passive scale for emotions. Power represents the control that is exerted, at one end of the scale we have emotions that are submissive and at the other end we have emotions that are dominant.
As this paper has alluded, there are many ways to develop an effective automatic emotional annotation algorithm. To highlight one approach we will analyze the Sequential Minimal Optimization (SMO) implementation of the Support Vector Machine (SVM) illustrated by Soumaya Chaffar a researcher and Diana Inkpen a professor of computer science at the University of Toronto.
One must develop a dataset or emotional dictionary that can be used for the emotion look-up detection in text. It is useful to have a variety of datasets collected in from different sources as one may be better suited for a different type of development. Next in order to further analyze the sentence, a feature set can be applied in order to highlight specific things such as negative words, conjunctions, punctuations, contexts and so on. At this point an algorithm may be applied to derive the emotional annotation of a text. Finally the algorithm can be compared to analyze its ability to distinguish emotions.
Datasets
Five datasets were used in the experiment by Chaffar and Inkpen, these are detailed below.
Text Affect
This dataset consisted two separate parts drawn from news headlines from renowned newspapers, as well from the Google News search engine. The first part was developed for the training and composed of 250 annotated sentences. The second part was designed for testing and it consisted of 1,000 annotated sentences. Six emotions (anger, disgust, fear, joy, sadness and surprise - similar to the Ekman model) were used to annotate sentences according to the degree of emotional load.
Neviarouskaya et al.’s Dataset
This data set was developed by Neviarouskaya and others. In these datasets, ten labels were utilized to annotate sentences by three annotators. These labels consist of the nine emotional categories defined by Izard; anger, disgust, fear, guilt, interest, joy, sadness, shame, surprise and a neutral category. For their experiment Chaffar and Inkpen only considered sentences on which two annotators or more completely agreed on the emotion category.
-
Dataset 1. This dataset includes 1000 sentences extracted from various stories in 13 diverse categories such as health, education and wellness.
-
Dataset 2. This dataset includes 700 sentences from collection of diary-like blog posts.
Alm’s Dataset
Alm’s Dataset contained annotated sentences from fairy tales - Grimm’s Fairy Tales. In the highlighted experiment only sentences with high emotional agreement were used in the experiment. Ekman’s list of basic emotions was used for sentences annotations, because of data sparsely and related semantics between anger and disgust, these two emotions were merged together by Alm. This resulted in the five emotions of happy, fearful, sad, surprised and angry-disgusted.
Aman’s Dataset
This dataset consists of emotion-rich sentences collected from blogs. Ekman’s basic emotions happiness, sadness, anger, disgust, surprise, fear and also a neutral category were used for sentences annotation. The sentences were labeled with emotions by four annotators. The experiment considered only sentences for which the annotators agreed on the emotion category.
Feature Sets
Feature sets can be applied in order to highlight specific things such as negative words, conjunctions, punctuations, contexts, which could drastically change the meaning and emotional load of a sentence. To ensure proper emotional classification of text, it is essential to choose the relevant feature sets to be considered. Various feature sets are illuminated below:
-
Bag-Of-Words (BOW). Each sentence in the dataset was represented by a feature vector composed of boolean attributes for each word that occurs in the sentence. If a word occurs in a given sentence, its corresponding attribute is set to 1; otherwise it is set to 0. BOW considers words as independent entities and it does not take into consideration any semantic information from the text. However, it generally performs very well in text classification.
-
N-grams. They are defined as sequences of words of length n. N-grams can be used for catching syntactic patterns in text and may include important text features such as negations For example, “not happy”. Negation is an important feature for the analysis of emotion in text because it can totally change the expressed emotion of a sentence. For instance, the sentence “I’m not happy” should be classified into the sadness category and not be classified into the happiness category.
-
Lexical emotion features. This kind of feature set represents the set of emotional words extracted from affective lexical repositories such as, WordNetAffect. The highlighted experiment used all the emotional words, from the WordNetAffect (WNA), associated with the six basic emotions.
-
Dependency analysis. MiniPar is an example of a program that can be used to derive features of a sentence, by breaking the contents down and showing how the words are related to one another. In MiniPar, nodes are numbered and arcs between nodes are a dependency relation. Each dependency relation is labeled with a tag to identify the kind of relation that these nodes share (See Table 1).
Table 1. MiniPar example of dependency tree for the sentence “two of her tears wetted his eyes and they grew clear again
-
Emotional dimension analysis. An EmoTag is based on the emotional dimensions of a sentence. Words are filtered using a stop list and dependency analysis used to identify scope of negation. Emotion value of word is looked up in an affective dictionary; emotion value is inverted for words that were filtered for negation. Once all the words of the sentences have been evaluated, the average value for each dimension is calculated (See Table 2).
Table 2. Fragment of a marked up table.
Algorithm Application
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. Using various Weka software tools Chaffar and Inkpen’s experiment illustrates the effectiveness of different algorithms on the various datasets discussed earlier; J48 for Decision Trees, Naïve Bayes for a Bayesian classifier and the SMO implementation of SVM. The Weka ZeroR classifier was used as a base line because this classifier does not take into account any sort of feature set (See Table 3).
Table 3. Results for the training datasets using the accuracy rate (%)
From Table 3 we can see that the SMO algorithm distinguishes itself and the premier algorithm in this selection, as it has the highest accuracy for matching emotions. In the next section of their experiment they applied the SMO algorithm with various feature sets to see the highest accuracy. In this section they used different data sets to generalize unseen examples (See Table 4).
It is interesting to note that for the various datasets the simplified BOW approach seems to achieve the highest accuracy on most of the test sets/datasets. This could be explained by the fact that the SMO algorithm does not accurately account for the various features presented by the feature sets, the feature sets themselves contain some type of fundamental error or the test sets/datasets presented do not accurately meet the expectations of the algorithm.
Conclusions
Written language is one of our most common forms of communication and only increasing in popularity, besides transmitting informative content, it also transmits information about the user’s attitude including the user’s emotional state. While there have been many studies carried out in the field of human computer interaction, comparatively little research has been devoted to the detection of emotions in texts [6]. There is a smorgasbord of related fields that could benefit from and contribute to the advancement of automated emotional annotation of text. Before that can happen further research and work is needed to aid the advancement of this field and aid the proper development of the semantic web which could be used to boost the development of the emotional annotation of text.
References
[1] Alm, Roth and Sproat. “Emotions from Text: Machine Learning for Text-based Emotion Prediction.” Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp.579-586, 2005.
[2] Chaffar, Soumaya, and Diana Inkpen. "Using a Heterogeneous Dataset for Emotion Analysis in Text." School of Information Technology and Engineering, University of Ottawa Ottawa, ON, Canada. University of Ottawa, 2011. Web. 7 Oct 2012. .
[3] Devillers, Laurence , Laurence Vidrascu, and Lori Lamel. "Challenges in real-life emotion annotation and machine learning based detection." Neural Networks. 18.4 (2005): 407-422, ISSN 0893-6080.
[4] Picard, R. W. “Affective Computing.” The MIT Press, MA, USA, 1997.
[5] Quan, Changqin, and Fuji Ren. "Sentence Emotion Analysis and Recognition Based on Emotion Words Using Ren-CECps." International Journal of Advanced Intelligence. 2.1 (2010): 105-117. Web. 7 Oct. 2012. .
[6] Strapparava , Carlo, and Rada Mihalcea. "Learning to Identify Emotions in Text". Fortaleza, Brazile: 2008. Web. 7 Oct. 2012.
Share with your friends: |