Protocols such as the Facial Action Coding System (FACS), a method for measuring facial behaviors developed by Paul Ekman, Wallace Friesen, and Joseph Hager, identify specific changes to facial expressions that occur with muscular contractions and quantify how the contraction of each facial muscle (singly and in combination with other muscles) changes the appearance of the face. Observed changes in the appearance of the face are associated with the action of muscles that produce them to create a reliable means for determining the category or categories in which each facial behavior belongs. FACS observations are quantified in measurement units known as Action Units (AUs). Some facial appearances include motions of more than one muscle whose effects on observed changes in facial appearance are not readily distinguishable. In other instances, the relatively independent actions of different parts of the same muscle or muscles participate in distinct facial actions. Using FACS, an observed facial expression or action is decomposed into the specific AUs that produced the movement. A facial expression is thereby given a score that includes the list of AUs that produced it.
FACS provides an objective and comprehensive language for describing facial expressions and relating them back to what is known about their meaning from the behavioral science literature. Because it is comprehensive, FACS also allows for the discovery of new patterns related to emotional or situational states. Spontaneous facial expressions differ from posed expressions both in the muscles moved and the dynamics of the muscle movements. Having subjects pose states such as comprehension and confusion is of limited use since there is a great deal of evidence that people do different things with their faces when posing versus during a spontaneous experience. Likewise, subjective labeling of expressions can also be less reliable than objective coding for finding relationships between facial expression and other state variables.
FACS has enabled discovery of new relationships between facial movement and internal state. For example, early studies of smiling focused on subjective judgments of happiness, or on just the mouth movement (e.g., zygomatic major). These studies were unable to show a reliable relationship between expression and other measures of enjoyment, and it was not until experiments with FACS measured facial expressions more comprehensively that a strong relationship was found: namely that smiles which featured both orbicularis oculi (AU6), as well as zygomatic major action (AU 12), were correlated with self-reports of enjoyment, as well as different patterns of brain activity, whereas smiles that featured only zygomatic major (AU12) were not. Research based upon FACS has also shown that facial actions can show differences between genuine and faked pain, and between those telling the truth and lying at a much higher accuracy level than naive subjects making subjective judgments of the same faces. Facial actions can predict the onset and remission of depression, schizophrenia, and other psychopathology, can discriminate suicidal from non-suicidal depressed patients, and can predict transient myocardial ischemia in coronary patients. FACS has also been able to identify patterns of facial activity involved in alcohol intoxication that observers not trained in FACS failed to note.
Although FACS has a proven record for the scientific analysis of facial behavior, the process of applying FACS to videotaped behavior is currently done by hand. This limitation has been identified as one of the main obstacles to doing research on emotion. FACS coding is currently performed by trained experts who make perceptual judgments of video sequences, often frame by frame. It requires approximately 100 hours to train a person to make these judgments reliably and to pass a standardized test for reliability. It then typically takes over two hours to code comprehensively one minute of video. Furthermore, although humans can be trained to code reliably the morphology of facial expressions (which muscles are active), it can be quite difficult to code the dynamics of the expression (the activation and movement patterns of the muscles as a function of time). Such expression dynamics, not just static morphology, may provide important information. For example, spontaneous expressions have a fast and smooth onset, with distinct facial actions peaking simultaneously, whereas posed expressions tend to have slow and jerky onsets, and the actions typically do not peak simultaneously.
The importance of spontaneous behavior for developing and testing computer vision systems becomes apparent when the neurological substrate for facial expression is examined. There are two distinct neural pathways that mediate facial expressions, each one originating in a different area of the brain. Volitional facial movements originate in the cortical motor strip, whereas the more involuntary, emotional facial actions, originate in the subcortical areas of the brain. Research documenting these differences was sufficiently reliable to become the primary diagnostic criteria for certain brain lesions prior to modern imaging methods. The facial expressions mediated by these two pathways have differences both in which facial muscles are moved and in their dynamics. The two neural pathways innervate different facial muscles, and there are related differences in which muscles are moved when subjects are asked to pose an expression such as fear versus when it is displayed spontaneously. Subcortically initiated facial expressions (the involuntary group) are characterized by synchronized, smooth, symmetrical, consistent, and reflex-like facial muscle movements whereas cortically initiated facial expressions are subject to volitional real-time control and tend to be less smooth, with more variable dynamics. However, precise characterization of spontaneous expression dynamics has been slowed down by the need to use non-invasive technologies (e.g. video), and the difficulty of manually coding expression intensity frame-by-frame. Thus, the importance of video based automatic coding systems.
These two pathways appear to correspond to the distinction between biologically driven versus socially learned facial behavior. Researchers agree, for the most part, that most types of facial expressions are learned like language, displayed under conscious control, and have culturally specific meanings that rely on context for proper interpretation. Thus, the same lowered eyebrow expression that would convey “uncertainty” in North America might convey “no” in Borneo. On the other hand, there are a limited number of distinct facial expressions of emotion that appear to be biologically wired, produced involuntarily, and whose meanings are similar across all cultures; for example, anger, contempt, disgust, fear, happiness, sadness, and surprise. A number of studies have documented the relationship between these facial expressions of emotion and the physiology of the emotional response. There are also spontaneous facial movements that accompany speech. These movements are smooth and ballistic, and are more typical of the subcortical system associated with spontaneous expressions. There is some evidence that arm-reaching movements transfer from one motor system when they require planning to another when they become automatic, with different dynamic characteristics between the two. It is unknown whether the same thing happens with learned facial expressions. An automated system would enable exploration of such research questions.