Lőrincz, András Mészáros, Tamás Pataki, Béla Embedded Intelligent Systems



Download 0.9 Mb.
Page10/17
Date17.05.2017
Size0.9 Mb.
#18486
1   ...   6   7   8   9   10   11   12   13   ...   17
12.2.3. 11.2.3 Head motion and body talk

Head plus eye motion can be very expressive for emotions and intentions. One may 'point' with the eyes, the head or both giving instructions, for example. Also, behavioral signs, including head motion characterize the individual. Hill and Johnston [] showed that the recognition of identity is helped more by rigid motion cues (such as head motion) than by non-rigid motion cues (such as facial expression). Recognition of sex, however, seems to be mediated by changes in facial expressions. Head motion codes are shown in Fig. 478.

12.2.4. 11.2.4 Conscious and subconscious signs of emotions


Situations (spatio-temporal context) can help the recognition of emotions and mental states, the hidden variables of behavior. It is easy to mix for example anger and being suspicious during problem solving, but these expressions can be set apart in temporal context and according to the success or failure. Many facial expressions and head/eye motions are unconscious and are hard (sometimes impossible) to hide. Polygraph, that measures physiological indices including blood pressure, respiration and skin conductivity, is a commonly used tool for lie detection. It is considered unreliable by many people []. Ekman claims to reach 90% detection accuracy when facial expressions are combined with voice and speech measures. This claim needs to be verified by others, but the key message is that deception can not be identified by a single clue Electroencephalograph (EEG) and functional Magnetic Resonance Imaging (fMRI) are also capable to infer about deceptive behavior. We have limited control over many of these signals.

The detection of pain or tiredness is a more important issue in the context of of human-computer interactions and collaboration if we assume that the computer works for the sake of the user and that the user would like to take full advantage of the a backing statistics based recommender system. It is equally relevant to identify situations when the user is "in the zone" or in the "state of flow" []; a completely focused motivation. This is a single minded immersion when emotions serve performance and learning. Other terms that try to capture this state include in the moment, on a roll, wired in, in the groove, on fire, in tune, centered, or singularly focused 9. The facial expression, however, is very similar a blank or oblivious look. So while flow is to be achieved in the educational setting, the unmindful state is to be avoided, but the facial expressions are very similar. A distinction can be by means of the spatio-temporal context and by previous experiences on user behavior in the context of the actual task.

In sum, human-computer collaboration requires modeling of the behavior including as much information as possible starting from visual and acoustic information and taking advantage of other sensors, like blood pressure, skin conductance, heart rate. The availability of such additional signal has been increased drastically by the fast evolution of mobile tools and mobile phones.

12.3. 11.3 Measuring behavioral signals

We will review acoustic and visual behavioral signals since they are easily available and they are the typical forms of human-human communication. The best techniques of our days utilize databases and tune the recognition / identification / classification by means of large databases. There is a related European project, the 'European network of excellence in social signal processing'10 that also lists some of the most relevant databases11 and a number of tools12 for such studies. Some homeworks will be selected from these.

12.3.1. 11.3.1 Detection emotions in speech

This topic is in the focus of current interest to improve user experiences with automated phone attendants. A large set of methods have been tried and demonstrated success. For a review, see [ és ] and the references therein. Special challenges have been organized to compare the methods and the databases [].

12.3.2. 11.3.2 Measuring emotions from faces through action units

Below, we review the most popular models of facial tracking. Then we turn to the estimation methods.

12.3.2.1. 11.3.2.1 Constrained Local Models

CLM methods are generative parametric models for person-independent face alignment. In this work we were using a 3D CLM method, where the shape model is defined by a 3D mesh and in particular the 3D vertex locations of the mesh, called landmark points. Consider the shape of a 3D CLM as the coordinates of 3D vertices of the landmark points:

or, , where . We have samples: . CLM models assume that - apart from the global transformations; scale, rotation, and translation - all can be approximated by means of the linear principal component analysis (PCA) forming the PCA subspace. Details of the PCA algorithm are well covered by Wikipedia https://en.wikipedia.org/wiki/Principal_component_analysis. The interested reader may wish to investigate the more elaborated tutorial []

In the next subsection we briefly describe the 3D Point Distribution Model and the way CLM estimates the positions of the landmarks.

12.3.2.2. 11.3.2.2 Point Distribution Model


The 3D point distribution model (PDM) describes non-rigid shape variations linearly and composes it with a global rigid transformation, placing the shape in the image frame:





where , denotes the 2D location of the landmark subject to transformation , and denotes the parameters of the model, which consist of a global scaling , angles of rotation in three dimensions (), translation and non-rigid transformation . Here is the mean location of the landmark averaged over the database, i.e. , , , and similarly, for and . Matrix is a piece in and corresponds to the landmarks. Columns of form the orthogonal projection matrix of principal component analysis and its compression dimension is . Finally, matrix denotes the projection matrix to 2D:

and thus ().

By applying PCA on the points we get an estimate of the prior of the parameters:

that is CLM assumes a normal distribution with mean and variance for parameters . in (6) is provided by the PCA and the parameter vector assumes the form .

12.3.2.3. 11.3.2.3 Formalization of Constrained Local Models

CLM is constrained through the PCA of PDM. It works with local experts, whose opinion is considered independent and are multiplied to each other:

where is a stochastic variable, which is 1 (-1) if the marker is (not) in its position, is the probability that for image and for marker position determined by parameter , the marker is in its position.

Local experts are built on Logit Regression and are trained on labeled samples. The functional form of Logit is

where is a normalized image patch around point , and are parameters of the distribution to be learned from samples. Positive and negative samples for the right corner of the right eye are shown in Fig. 48.

Local expert's response - that depend on the constraints of the PDM and the response map of the local expert in an appropriate neighborhood - can be used to express in (7) (Fig. 49):

where CLM assumes with , , is the eigenvalue of , the covariance matrix of stochastic variable and where we applied Bayes'rule and the tacit assumption [] that is a weak function of the parameters to be optimized was accepted.

12.3.2.4. 11.3.2.4 Active Appearance Models

Both two- and three-dimensional Active Appearance Models (AAMs) have been developed. They are made of an active shape model, which is similar to the probability distribution model of the CLM, and a texture model, which is radically different. In this latter, one takes the marker points, connects those by lines in such a way that markers form the vertices of triangles and all closed areas are triangles. The texture within the triangles undergo affine transforms in the matching procedure to match actual estimations of the triangles. Both the texture model and the shape model are compressed and Gaussian distribution is assumed for the joined model []

12.3.2.5. 11.3.2.5 Estimation of action units

Estimation of action units (AUs) utilize annotated databases; the images are annotated by the AUs and by their strength. If the CLM fit is satisfactory then one may estimate the AUs from the (change of the) shape, or from the (change of the) texture around the marker points. Combined methods have been tried in the literature and show some improvements.

12.3.2.6. 11.3.2.6 Emotion estimation

Similar estimation can be used for the emotions. One may use directly the emotion labeled faces together with the CLM fit of the marker points to estimate shape changes and/or changes od the texture around the marker points. Typical estimations make use of SVM based linear regressors and SVM classifiers both for AUs and for emotions.

12.3.3. 11.3.3 Measuring gestures

Measuring gestures and body motions are possible in a number of ways, such as (i) via wearable digital textile sensor [] that may also include accelerometer and gyroscope [] as well as other sensors, e.g. ECG that may save life13. Kinect is a useful tool for remote optical measurement. Gesture estimation from a single camera video capture is demanding but it is the only available tool for the annotation of movies and videos. While the case of Kinect is relatively simple since it gives back a 3D view, single camera systems may take advantage of structure from motion algorithms. Two-camera stereo vision or its many camera generalization can use registration methods, e.g., using the silhouettes of the arms or the body if those are not occluded.

12.4. 11.4 Architecture for behavioral modeling and the optimization of a human computer interface

At a very high level, the intelligent architecture that aims to model and possibly to optimize human performance is made of the following components: (i) sensory processing unit, (ii) control unit, (iii) inverse dynamics unit, and (iv) decision making unit. Although it looks simple, one has to worry about a number of things, such as the continuity of space and time, the curse of dimensionality, if and how space and time should be discretized, and planning in case of uncertainties, e.g., in partially observed situations, including information about purposes, cognitive, and emotional capabilities of the user. Below, we review the basic components of the architecture. Every component can be generalized to a great extent. In particular cases, some of the components may be left out. The architecture should be able to estimate parameters of human behavior.



  • This stage selects different samples under random control in order to collect state-action-new state triples in order to learn the controllable part of the space.

  • For a sufficient number of collected samples, in principle, one can estimate the dimension of the state space and the related non-linear mapping of sensory information to this lower dimensional manifold. In the present example, the low dimensional manifold is known and selected samples will be embedded into the low dimensional space will be used for interpolation.

  • Out-of-sample estimations will be used for the identification of the dynamics in the form of autoregressive exogeneous (ARX) process. Generalization to more complex non-linear models, such as switching non-linear ARX models is, in principle, possible.

  • The ARX process can be inverted and the inverted ARX process can be used for control.

  • The inverse dynamics can be learned. A linear-quadratic regulator14 is a relatively simple option.

  • Optimization concerns long-term goals or a hierarchy of those. Optimization then belongs to the field of reinforcement learning (RL). The continuity of space and actions can be overcome by means of the event learning formalism [ és ] of RL that enables continuous control in the optimization procedure.

  • RL optimization can be accomplished in many ways, including the Optimistic Initial Model, which is favorable for many variable cases [].



Download 0.9 Mb.

Share with your friends:
1   ...   6   7   8   9   10   11   12   13   ...   17




The database is protected by copyright ©ininet.org 2024
send message

    Main page