Recently, computationally understanding and evaluating the aesthetic quality of visual displays such as photos have drawn much attention from domains like computer vision and image understanding [9, 4, 5, 6]. Thanks to the widespread use of digital cameras and easy-to-connect Internet, ordinary people can now easily take photos and quickly upload them online. On one hand, this promotes the creation of many excellent photographic works. On the other hand, it leads to an explosively growing volume of digital images among which a significant portion are of low quality. It is apparent that when people can easily take photos with little cost, they become less cautious and do not think thoroughly before clicking the shutter.
Computationally understanding the aesthetics in images motivates a variety of useful applications. For instance, low-quality images can be filtered out automatically and efficiently so we can easily access high-quality images. Such techniques can also recommend exemplar photography works to amateurs, provide hints to gain a deeper understanding of aesthetics, and inspire them to take more aesthetically appealing photographs.
Several important elements, such as color, exposure, depth-of-field, and composition, etc., are believed to be the key of great photography. Researchers design specific visual features to describe different elements of a photograph. One of the earliest work by Datta et al. [3] characterizes Colorfulness of photos and purity of these colors. Similarly, Ke [10] and Su [18] utilizes color histogram to represent the color palette used by a photo. Furthermore, Simplicity [10, 13] and Contrast [7] of color are also taken into account in aesthetic quality assessment. More work studies how to combine distinct colors together to generate a more harmonious view [13, 12, 17, 14].
Aside from color, the use of light can also be a determinant for the aesthetic quality of photographs, especially portraits. Good lighting can dramatically improve the quality of a photo [8]. Datta et al. [3] use the average pixel intensity to characterize the use of light in photography. Ke et al. [10] and Luo et al. [13] both point out that when the brightness of the subject area is significantly different than that of foreground area, this gives rise to a more pleasing look. Dhar et al. [7] focused their study on outdoor images. To differentiate natural outdoor illumination, they introduced in three attributes: clear skies, cloudy skies, and sunset skies. Recognizing the importance of lighting condition in human portraits, Luo et al. [12] designed several lighting features such as the ratio of face areas, the average lighting of faces, the ratio of shadow areas, and the face clarity to assess the quality of human portraits.
Low depth-of-field (low DoF) techniques capture objects within a small range of depths in sharp focus while objects at other depths are blurred. This is usually used to emphasize the subject. Researchers have attempted to include the feature of low DoF in their work to identify high-quality photographs. Datta et al. [3, 8, 17] utilized wavelet-based texture to measure the graininess or smoothness in a photo. Ke et al. [10] computed the spatial distribution of high frequency edges of an image to try to capture the blur of the background. Dhar et al. [7] employed Daubechies wavelet based features to indicate the blurring amount over the photo. Luo et al. [12] extracted different types of subjects with different approaches such as picking out clear object in low DoF photo or detecting humans in portrait photo. They then measured the clarity of the subject area and treated it as one factor which would influence the aesthetic quality of photographs.
Compared to the above elements, modeling composition of a photo is more challenging because it requires understanding semantic information. User studies indicate that composition is the most important feature related to the aesthetic quality of a photo [15, 16]. In photography, composition is the arrangement of visual elements in the scene. Good composition highlights the object of interest in a photo to immediately capture attention.
Generally, viewers prefer simple and clear compositions. Hence, early work proposed high-level composition features based on locations and orientations of long dominant lines in images [12] to capture the simplicity of the composition. A more intelligent way to describe composition is to model popular composition rules such as the rule of thirds and the rule of golden ratio. The rule of thirds divides the photo evenly with two vertical lines and two horizontal lines, resulting in four intersection points. Studies in photography show that people's first glances always fall at the four intersections other than the center of the photo.
Researchers design features to model compositions following the rule of thirds [3, 17, 13, 7]. The rule of ‘golden ratio’ demonstrates that the position of horizon line in scenic photos should be adjusted to satisfy the golden ratio. The golden ratio is always considered as the most beautiful ratio. Bhattacharya et al. [2] enhance the quality of amateurish photos by adjusting their compositions to follow the rule of golden ratio. Su et al. [18] proposed another novel method to represent different types of photograph compositions. They evenly divided the photo into N×N patches where N equals to {2, 3, 6} based on which different patterns of foreground/background area were predefined. Zhang's paper [21] automatically recommended suitable positions and poses of people in the scene of portrait photography.
All these composition rules can be used to model simple situations. However, due to the complexity of real scene, photographers need to consider a higher level of aesthetic principle on composition. In pictorial art, good composition is considered as a congruity or agreement exists among the elements in a design [11]. The design elements seem to belong together as if there are some implicit visual connections between them.
Another term to describe this kind of unity is harmony. By reflecting this principle in photography, subjects in one scene should not aimlessly scatter around. Instead, they should unify together. To convey such unity in their photos, professional photographers have designed dozens of executable techniques for composition. One universal and interesting technique is to embed basic geometrical shapes in photographic compositions [19] (e.g., FIG. 1). Human beings begin to learn about basic geometrical shapes such as circles, rectangles, and triangles since very young. Human eyes are trained to recognize those shapes immediately. Therefore, Valenzuela [19] suggested that we can explicitly or implicitly embed basic geometrical shapes in photos to attract viewers. Moreover, since these shapes are immediately recognized, subjects bounded within such shapes or implicitly constructing such shapes are perceived as a unity.
Among all basic geometrical shapes, the triangle is arguably the most popular shape utilized by professional photographers to make a composition more interesting. Such compositional technique is called “the triangle technique.” There are also numerous examples associated with the use of triangles in visual art and architecture works.
Two fundamental questions need to be addressed when analyzing the compositions of a portrait photograph: where are the human subjects located within the scene and how do they pose? The rule of thirds has answered the first question by suggesting that positioning the human subjects near ⅓ of the scene is more appealing than in the center. Therefore, the location of human subjects can be easily modeled and assessed with multiple state-of-the-art methods based on rule of thirds. Nevertheless, the second question remains a challenge.