Facial expression analysis review

Лэ Кыонг

Аннотация статьи

In the article the review of facial expression analysis was carried out as well as some open challenges of facial expression analysis is considered.

Текст статьи

I. Introductions

Our face is an intricate, highly differentiated part of our body - in fact, it is one of the most complex signal systems available to us. It includes over 40 structurally and functionally autonomous muscles, each of which can be triggered independently of each other. The facial muscular system is the only place in our body where muscles are either attached to a bone and facial tissue (other muscles in the human body connect to two bones), or to facial tissue only such as the muscle surrounding the eyes or lips. Almost all facial muscles are triggered by one single nerve - the facial nerve. There is one exception, though: The upper eyelid is innervated by the oculomotor nerve, which is responsible for a great part of eye movements, pupil contractions, and raising the eyelid.

Obviously, facial muscle activity is highly specialized for expression - it allows us to share social information with others and communicate both verbally and nonverbally. In short we could say: Facial expressions are movements of the numerous muscles supplied by the facial nerve that are attached to and move the facial skin.

The facial nerve emerges from deep within the brainstem, leaves the skull slightly below the ear, and branches off to all muscles like a tree. Interestingly, the facial nerve is also wired up with much younger motor regions in our neo-cortex (neo as these areas are present only in mammalian brains), which are primarily responsible for facial muscle movements required for talking.

As the name indicates, the brainstem is an evolutionary very ancient brain area which humans share with almost all living animals. Brainstem and motor cortex are specifically active dependent on whether a facial expression is involuntary or voluntary. While the brainstem controls involuntary and unconscious

expressions that occur spontaneously, the motor cortex is involved in consciously controlled and intentional facial expressions. Often, the amygdala (both left and right) is associated with processing of live-threatening, fearful events or stimuli of high sexual appeal and bodily pleasure. Besides fear and pleasure processing, the amygdala has been found to be generally responsible for autonomic functions associated with emotional arousal.

Fig. 1

In everyday language, emotions are any relatively brief conscious experiences characterized by intense mental activity and a high degree of pleasure or displeasure. In scientific research, a consistent definition has not been found yet. There’s certainly conceptual overlaps between the psychological and neuroscientific underpinnings of emotions, moods, and feelings. Facial expression plays the major role in non-verbal communication, according to Meharabian [1], 55% communicative cues can be judge by facial expression. Back to the year 1872, Darwin published The Expression of the Emotions in Man and Animals, in which he argued that all humans, and even other animals, show emotion through remarkably similar behaviours. Darwin treated the emotions as separate discrete entities, or modules, such as anger, fear, disgust, etc. Many different kinds of research-neuroscience, perception and cross-cultural evidence –show that Darwin's conceptualization of emotions as separate discrete entities is correct. In 1971, Paul Ekman and Wallace V. Friesen’s facial expression work [2] gave the correlation between a person’s emotional state & psychological state, which described the FACS. It is based on the muscular contractions which produce our facial expressions. The authors also defined 6 basic face expressions: Happy, Surprise, Disgust, Sad, Angry, and Fear (fig. 2). They are named as 6 universal emotions and are used by most researchers. Lastly the seventh emotion was added with title – contempt.

Happy		Surprise		Disgust
Sad	Angry		Fear		Contempt

Fig. 2. Seven basic expressions

There is an ongoing discussion in emotion research on how the different emotions could be distinguished from each other:

Discrete emotion theory assumes that the seven basic emotions are mutually exclusive, each with different action programs, facial expressions, physiological processes, and accompanying cognitions.
Dimensional models assume that emotions can be grouped and arranged along two or more dimensions. Most dimensional models use valence (positive vs. negative emotions) as horizontal axis and arousal (activating vs. calming emotions) as vertical axis. With valence and arousal, more subtle emotional classifications are possible - breaking “happiness” into a less aroused “happy” state and a more aroused “elated” state, for example. Again, facial expressions are core indicators of underlying emotional states.

Fig. 3

With facial expression analysis you can test the impact of any content, product or service that is supposed to elicit emotional arousal and facial responses - physical objects such as food probes or packages, videos and images, sounds, odors, tactile stimuli, etc. Particularly involuntary expressions as well as a subtle widening of the eyelids are of key interest as they are considered to reflect changes in the emotional state triggered by actual external stimuli or mental images.

Now which fields of commercial and academic research have been adopting facial expression analysis techniques lately? Here is a peek at the most prominent research areas:

Consumer neuroscience and neuromarketing;
Media testing & advertisement;
Psychological research;
Clinical psychology and psychotherapy;
Medical applications & plastic surgery;
Software UI & website design;
Engineering of artificial social agents (avatars).

II. Facial expression recognition

Facial expression recognition (FER) provides machines a way of sensing emotions that can be considered one of the mostly used artificial intelligence and pattern analysis applications. Facial expression recognition can be divided into three major steps: (1) face acquisition stage to automatically find the face region for the input images; (2) normalization of intensity, uniform size and shape; (3) facial data extraction and representation-extracting and representing the information about the encountered facial expression in an automatic way; and (4) facial expression recognition step that classifies the features extracted in the appropriate expressions. Figure 4 illustrates the block diagram of FER systems

Fig. 4. Block diagram of Facial expression recognition system

III. Facial expression recognition systems

1. Face Acquisition

Face Acquisition is a process of localizing and extracting the face region from the background.

Fig. 5. Detect face region in the image

The Viola-Jones algorithm is a widely used mechanism for facial detection. The method was devised by Viola and Jones [4] in the year 2001 that allows the detection of image features in real-time. The algorithm combines four key concepts:

Simple rectangular features, called Haar-like features.
Integral image for rapid features detection
AdaBoost machine-learning method
Cascade classifier to combine many features efficiently

The Viola and Jones algorithm uses Haar-like features to detect faces (fig. 6). Given an image, the algorithm looks at many smaller subregions and tries to find a face by looking for specific features in each subregion. It needs to check many different positions and scales because an image can contain many faces of various sizes.

Fig. 6. Haar-like features for face detection

Viola-Jones was designed for frontal faces, so it is able to detect frontal the best rather than faces looking sideways, upwards or downwards.

2. Normalization

Normalization is a process which can be used to improve the performance of the FER system and it can be carried out before feature extraction process. The aim of the phase is to obtain images which have normalized intensity, uniform size and shape. This phase includes different types of processes such as orientation normalization, image scaling, contrast adjustment, and additional enhancement to improve the expression frames.

– Orientation normalization: it is carried out by a rotation based on the eyeball’s location and an affine translation defined by locations of the eyeballs.

Fig. 7. Orientation normalization

– Scaling normalization: facial region images are scaled and cropped to a normalized size for different experiments. Geometric face model [5] are proposed to address the task.

Fig. 8. Geometric face model [5]

– Brightness normalization: The face images captured at different times or positions often have different brightness. In order to reduce the affection of the brightness, histogram equalization can be used to normalize the contrast of the images, and median filter can be used to linearly smooth the image. Histogram equalization transforms the values in an intensity image so that the histogram of the output image is approximately flat. So that, it is able to enhance contrast of images and make images features more distinguishable. Figure 9 shows an example of face images with their histogram before and after the normalization process.

Fig. 9. Normalize an image using Histogram equalization

3. Feature extraction

After the presence of a face has been detected in the observed scene, the next step is to extract the information about the encountered facial expression in an automatic way. Feature extraction is finding and depicting of positive features of concern within an image. These features, then, can be used as an input to the classification. Feature extraction can be categorized into two types and they are geometric based and appearance based.

The geometrically based feature extraction comprises eye, mouth, nose, eyebrow, other facial components and the appearance based feature extraction comprises the exact section of the face. Examples of geometric based are principal component analysis (PCA), Linear Discriminant Analysis (LDA), Kernel methods or Trace Transform. PCA method [6, 7], which is called eigenfaces in [8, 9] is widely used for dimensionality reduction and recorded a great performance in face recognition. Contrasting the PCA which encodes information in an orthogonal linear space, the LDA method encodes discriminatory information in a linear separable space of which bases are not necessarily orthogonal. Researchers have demonstrated that the LDA based algorithms outperform the PCA algorithm for many different tasks [10, 11].

Appearance-based methods present the appearance (skin texture) changes of the face, such as wrinkles and furrows. The appearance features can be extracted on either the whole-face or specific regions in a face image using image filters, such as Gabor wavelets. The Gabor filter, which was originally introduced by Dennis Gabor, 1946 [12], is widely used in image analysis, pattern recognition, and so forth.

Fig. 10. Gabor filter bank feature images calculated for the face image from JAFFE database

4. Classification

Two basic problems should be solved in this phase: define a set of categories/ classes and select of a classification mechanism. Firstly, expressions can be classified in term of “typical” emotions as defined by Paul Ekman [2]. Secondly, given the extracted facial features, the expression are identified by recognition engines. Many classification methods have been employed in the FER systems, such as support vector machine (SVM) [13], random forest [14], AdaBoost [15, 16], decision tree [17], naïve Bayes [18], multilayer neural networks and K-nearest neighbours, hidden Markov model (HMM) and deep neural networks.

IV. Facial expression recognition datasets

In this section, we discuss the publicly available datasets that are widely used in our reviewed papers. Table provides an overview of these datasets, including the number of image or video samples, number of subjects, collection environment, and expression distribution. Approach to collect FER-related data was mostly through the images captured in the laboratory, such as JAFFE [3] and CK+ [15], in which volunteers make corresponding expressions under particular instructions. However, since 2013, emotion recognition competitions have collected large-scale and unconstrained datasets, for example, FER2013 [16] queried automatically by the Google image search API. This implicitly promotes the transition of FER from lab-controlled to real-world scenarios.

Таble

An overview of the facial expression recognition datasets

Dataset	Samples	Resolution	Subject	Collection Env.	Expression distribution
JAFFE (1998)	213 images (Frontal and 30-degree images)	256x256 pixel grayscale	10	Lab	7 basic expressions
*CK+ (2000)*	593 image sequences	640x490 for grayscale, 640x480 for 24-bit colour	123	Lab	7 basic expressions
*FER 2013* *(2013)*	35.887 images	48x48 pixel grayscale	N/A	Web	7 basic expressions

V. Challenges of facial expression recognition

The problem of facial expression recognition requires proper techniques with challenges of different facial expression intensity (fig. 11), pose variations, occlusion, aging and resolution either in the frame of stationary object or video sequencing images. For example, the variation in illumination levels may affect the accuracy of extracting face features. Researchers have attempted to improve robustness in the shifting lighting conditions in various stages of the FER systems. Above all, most existing approaches focus on lab-controlled conditions in which the faces are often in the frontal view; but in real-world environments, the frontal view is not always available and, thus, causes a challenge in detecting the facial expressions.

Fig. 11. Intensity of Facial Expression examples of happy (A), sad (B), and fearful (C)

Список литературы

Mehrabian A. Communication without Words // Psychology Today. - 1968. - Vo1.2, No.4, pp. 53-56.
Paul Ekman, Wallace V. Friesen. Constants across Cultures in the Face and Emotion // J. Pers. Psycho. WV. -1971. -Vol.17, No. 2, pp. 124-129.
JAFFE Dataset “Japanese Female Facial Expression Database”. URL: zenodo.org/record/3451524#.YbxJ_slByUs (access date: 09.11.2021).
Paul Viola, Micheal Jones. Rapid object detection using a Boosted Cascade of Simple features // Conference on Computer Vision and Pattern Recognition. -2001.
Frank Y. Shih and Chao-Fa Chuang. Automatic ex traction of head and face boundaries and facial features // Information Science. -2004. -Vol. 158, pp.117-130.
Kirby, M., Sirovich, L. Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces // IEEE PAMI. -1990. -Vol. 12, pp. 103–108.
Sirovich, L., Kirby, M. Low-Dimensional Procedure for the Characterization of Human Faces // J. Opt. Soc. Am. - 1987. - Vol. A 4(3), pp. 519–524.
Turk, M., Pentland, A. Eigenfaces for Recognition // Journal of Cognitive Neuroscience. -1991. -Vol. 3, pp. 71–86.
Pentland, A., Moghaddam, B., Starner, T. Viewbased and modular eigenspaces for face recognition // Proceedings of the 1994 Conference on Computer Vision and Pattern Recognition, Seattle, WA, IEEE Computer Society Los Alamitos. - 1994. pp. 84–91.
Belhumeur, P., Hespanha, J., Kriegman, D. Using discriminant eigenfeatures for image retrieval // PAMI. -1997. -Vol. 19(7), pp. 711–720.
Zhao, W., Chellappa, R., Nandhakumar, N. Empirical performance analysis of linear discriminant classifiers // Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA. -1998. pp. 164–169.
Gabor, D. Theory of communication // Journal of the Institute of Electrical Engineers. -1946. -Vol. 93, pp. 429–457.
Naik S., Jagannath R.P.K. Advances in Machine Learning and Data Science// Springer; Singapore. -2018. GCV-Based Regularized Extreme Learning Machine for Facial Expression Recognition; pp.129–138.
Benini S., Khan K., Leonardi R., Mauro M., Migliorati P. Face analysis through semantic face segmentation // Signal Process. Image Commun. -2019. -Vol. 74, pp. 21–31. DOI: 10.1016/j.image.2019.01.005.
P. Lucey, J. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews. The Extended Cohn-Kanade Dataset (CK+): A complete datasetfor action unit and emotion-specified expression // Computer Vision and Pattern Recognition Workshops. - 2010.
D. E. P. L. C. e. a. I. J. Goodfellow. Challenges in representation learning: a report on three machine learning contests // Neural Information Processing, Springer, Berlin, Germany, 2013.
Vu-Tuan Dang, Hong-Quan Do, Viet-Vu Vu, Byeongnam Yoon. Facial expression recognition: A survey and its applications. Proceedings of 23rd International Conference on Advanced Communication Technology (ICACT), pp. 359-367, DOI: 10.23919/ICACT51234.2029370369, 2021.
Xiaoqing Gao, Daphne Maurer. Influence of intensity on children’s sensitivity to happy, sad, and fearful facial expressions // Journal of Experimental Child Psychology. -2009. -Vol. 102, pp. 503–521.

Facial expression analysis review

Похожие статьи

Другие статьи из раздела «Информационные технологии, телекоммуникации»