The study of human visual preferences and the emotions imparted by various works of art and natural images has long been an active topic of research in the field of visual arts and psychology. A computational perspective to this problem has interested many researchers and resulted in articles on modeling the emotional and aesthetic content in images [10, 11, 13]. However, there is a wide gap between what humans can perceive and feel and what can be explained using current computational image features. Bridging this gap is considered the “holy grail” of computer vision and the multimedia community.
There have been many psychological theories suggesting a link between human affective responses and the low-level features in images apart from the semantic content. For example, studies indicate that roundness and complexity of shapes are fundamental to understanding emotions. Studies of roundness indicate that geometric properties of visual displays convey emotions like anger and happiness. Bar et al. [5] confirm the hypothesis that curved contours lead to positive feelings and that sharp transitions in contours trigger a negative bias. With respect to the complexity of shapes, and as enumerated in various works of art, humans visually prefer simplicity. Any stimulus pattern is always perceived in the most simplistic structural setting. Though the perception of simplicity is partially subjective to individual experiences, it can also be highly affected by two objective factors, parsimony and orderliness. Parsimony refers to the minimalistic structures that are used in a given representation, whereas orderliness refers to the simplest way of organizing these structures [3].
These findings provide an intuitive understanding of the low-level image features that motivate the affective response, but the small scale of studies from which the inferences have been drawn makes the results less convincing. In order to make a fair comparison of observations, psychologists created the standard International Affective Picture System (IAPS) [15] dataset by obtaining user ratings on three basic dimensions of affect, namely valence, arousal, and dominance (FIG. 1). However, the computational work on the IAPS dataset to understand the visual factors that affect emotions has been preliminary. Researchers [9, 11, 18, 23, 25, 26] investigated factors such as color, texture, composition, and simple semantics to understand emotions, but have not quantitatively addressed the effect of perceptual shapes.
Previous work [11, 26, 18] predicted emotions aroused by images mainly through training classifiers on visual features to distinguish categorical emotions, such as happiness, anger, and sad. Low-level stimuli such as color and composition have been widely used in computational modeling of emotions. Affective concepts were modeled using color palettes, which showed that the bag of colors and Fisher vectors (i.e., higher order statistics about the distribution of local descriptors) were effective [9].
The study that did explore shapes by Zhang et al. [27] predicted emotions evoked by viewing abstract art images through low-level features like color, shape, and texture. However, this work only handles abstract images, and focused on the representation of textures with little accountability of shape. Zhang et al. characterized shape through Zernike features, edge statistics features, object statistics, and Gabor filters.
Emotion-histogram and bag-of-emotion features were used to classify emotions by Solli et al. [24]. These emotion metrics were extracted based on the findings from psycho-physiological experiments indicating that emotions can be represented through homogeneous emotion regions and transitions among them.
The first work that comprehensively modeled categorical emotions, Machajdik and Hanbury [18] used color, texture, composition, content, and semantic level features such as number of faces to model eight discrete emotional categories. Besides the eight basic emotions, to model categorized emotions, adjectives or word pairs were used to represent human emotions. The earliest work based on the Kansei system employs 23 word pairs (e.g., like-dislike, warm-cool, cheerful-gloomy) to establish the emotional space [23]. Along the same lines, researchers enumerated more word pairs to reach a universal, distinctive, and comprehensive representation of emotions in Wang et al. [25]. Yet, the aforementioned approaches of emotion representation ignore the interrelationship among types of emotions.