The exemplary embodiment relates to visual classification and finds particular application in connection with a system and method for predicting the iconicity of an image and for selection of class-related iconic images.
Humans often associate a concept, e.g., an object, a scene, a place or a sentiment, with a normalized visual representation (referred to as canonical representation). This observation led to the notion of a canonical or iconic image. An image is said to be canonical/iconic if it is a good representative for a given concept. Several characteristics can be viewed as indications of “representativeness.” For example, an image may be considered iconic if it is: 1) the best-liked image of the concept, 2) the picture a person would see when imagining the concept, 3) the photo a person would take of the concept, or 4) the image that facilitates recognition (see, Blanz, et al., “What object attributes determine canonical views?” Technical report No. 42, MPI (1996), hereinafter “Blanz 1996”). A similar definition is that an iconic image can be considered as an image that one would show to a person, for instance to a child, to teach a concept (see, Berg, et al., “Finding iconic images,” Proc. 2nd Internet Vision Workshop at CVPR (2009), hereinafter, “Berg 2009”), and is used herein.
There has been considerable interest in being able to predict automatically whether an image is iconic or not. See, for example, Berg 2009; Berg, et al., “Automatic ranking of iconic images,” Technical report, U. C. Berkeley (2007), hereinafter, “Berg 2007”; Jing, et al., “Canonical image selection from the web,” CIVR, pp. 280-287 (2007), hereinafter, “Jing 2007”; Li, et al., “Modeling and recognition of landmark image collections using iconic scene graphs,” ECCV 95: 213-239 (2008), hereinafter “Li 2008”; Mezuman, et al., “Learning about canonical views from internet image collections,” NIPS, pp. 728-736 (2012); Raguram, et al., “Computing iconic summaries of general visual concepts,” Internet Vision Workshop at CVPR (2009), hereinafter, “Raguram 2009”; and Weyand, et al., “Discovering favorite views of popular places with iconoid shift,” ICCV, pp. 1132-1139 (2011), hereinafter, “Weyand 2011”.
In general, people may consider images to be iconic when the image is a view of a relatively large object (relative to the size of an image), which is close to the center of the image, on a relatively clean or uncluttered background, where there is substantial contrast between the depicted object and the background, the object is observed from a suitable viewpoint, and where the object is clearly separated from the background. However, the relative importance of each of these aspects to human perception of iconicity has been difficult to quantify. Also, there is no guarantee that this list of properties is exhaustive.
Much of the work done on iconic images has focused on one of two properties: the viewpoint and the ability to summarize a collection. In the case of viewpoint, the image set typically corresponds to different photos of the same object instance, typically viewed under ideal conditions (e.g., a synthesized object with no background). Several studies have verified the existence of iconic viewpoints for three-dimensional objects (see, Blanz 1996; Bulthoff, et al., “Psychophysical support for a two-dimensional view interpolation theory of object recognition,” PNAS, pp. 60-64 (1992)) as well as for scenes (see, Ehinger, et al., “Canonical views of scenes depend on the shape of the space,” Proc. 33rd Annual Conf. of the Cognitive Science Society, pp. 2114-2119 (2011)). Several works have also considered the problem of computing the best viewpoint from a 3D model (Weinshall, et al., “Canonical views, or the stability and likelihood of images of 3D objects,” Image Understanding Workshop, pp. 967-971 (1994)) or a set of 2D shapes (see, Denton, et al., “Selecting canonical views for view-based 3-D object recognition,” ICPR, vol. 2, pp. 273-276 (2004), hereinafter, “Denton 2004”).
In the case of summarization, several studies have considered the case where the image set is a large collection of noisy images collected from the Internet, for example, by querying a search engine such as Google Image Search or a photo-sharing website such as Flickr. In this approach, an iconic image is considered to be an image that best summarizes the data and the problem of finding iconic images is generally treated as one of finding clusters (see, Jing 2007; Raguram 2009; Li 2008) or modes (see, Mezuman, et al., “Learning about canonical views from internet image collections,” NIPS, pp. 728-736 (2012); Weyand, et al., “Discovering favorite views of popular places With iconoid shift,” ICCV (2011)) in the image feature space. In most of these works, the results are evaluated either qualitatively through a manual inspection of the found iconic images (Mezuman 2012; Li 2008; Weyand 2011) or simply by measuring whether the found iconic images are relevant or not with respect to the concept (Raguram 2009). However, a relevant image may not necessarily be iconic.
Beyond viewpoint and summarization, Berg and Forsyth proposed a nearest-neighbor classifier to predict image iconicity and used figure vs. background segmentation to focus on the area of interest in the image (see, Berg 2007). However, their study does not provide any detailed analysis as to what makes an image iconic. In Berg 2009, possible properties that could correlate with iconicity, such as the object size and position are proposed. However, in their experimental study, the users were explicitly instructed to take these criteria into account, which biased the results somewhat favorably toward these properties. Raguram and Lazebnik proposed to leverage an aesthetic measure but only a qualitative evaluation of the impact of the aesthetic factor was conducted (Raguram 2009).
There remains a need for a system and method which identify properties which provide good indicators of iconicity.