Consumer photo collections are pervasive. Mining semantically meaningful information from such collections has been an area of active research in machine learning and computer vision communities. There is a large body of work focusing on problems of object recognition, such as detecting objects of certain types like faces, cars, grass, water, sky, and so on. Most of this work relies on using low level vision features (such as color, texture and lines) available in the image. In the recent years, there has been an increasing focus on extracting semantically more complex information such as scene detection and activity recognition. For example, existing systems have attempted at recognizing events through visual classification of scenes and objects, see for example in L.-J. Li and L. Fei-Fei, “What, where and who? classifying events by scene and object recognition,” in Proc. IEEE Intl. Conf. on Computer Vision, 2007. This system reported moderate success in recognizing a number of peculiar sports events, such as polo, rowing, and boche, due to the unique visual characteristics that can be observed from pictures of such events.
In all the above mentioned prior art, traditional image clustering and classification is performed based on individual images and using image-based features only, for example, color and edge histograms, or “bag of visual features” (see S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2006.).
However, images are often not independent of each other due to the temporal and spatial correlation among the images that belong to the same event. More specifically, in personal image collections, there is rich context information other than the image features, and such context information is usually complementary to the image features for the purpose of semantic understanding.
Accordingly, improved image classification techniques that consider the relationships between images are needed.