With the increasing use of digital imaging in general consumer applications, efficient management and organization of the image data has become important. The number of images even in a personal collection of a typical consumer can be fairly large. The concept of Auto-Albuming is a key step towards reducing the cost, time and efforts in organizing such large image databases. In particular, a semi-automatic event-clustering scheme may be used to sort a set of pictures into different groups, where each group contains similar pictures. Such schemes work on spatial color distribution and use process-intensive merging algorithms to group similar images together. However, such algorithms do not tell anything about the semantic class (e.g., beach, birthday party, swimming pool, etc.) to which each of these groups belongs. Thus, the next step needed for the automation of the albuming process is to find the semantic classification of the event contained in a group of ‘similar’ pictures.
After going through a large database of consumer pictures, it was observed that a majority of the groups of ‘similar’ images comes from the following classes: baby pictures, wedding party, birthday party, convocation, picnic, landscape, city pictures, beach, swimming pool and ocean view. Of course, the list is neither exhaustive nor mutually exclusive, i.e., there ordinarily are several pictures which may be classified under more than one event. In addition, the classification, in some cases, may be subjective. Hence, the task of event classification for very generic scenes is a very difficult problem. It requires not only the knowledge of the image regions or objects but also the semantic information contained in their emotional and spatial arrangement. Considering the state of the art research in computer vision, solving this problem is an enormous task. However, classification of the images belonging to most of the natural outdoor scenes, e.g., landscape, beach, swimming pool, garden, ocean view, etc. is mostly based on a few ‘natural’ regions in the image. Examples of these natural regions include water, sky, grass, sand, skin, etc. Although these regions show wide variations in their appearances, they can be captured to a certain extent by using simple features such as color, texture, shape, etc. While the present work deals only with the natural scene images, the proposed scheme can be modified to incorporate more high-level features, e.g., face location, etc., to widen its scope.
The main motivation for the present invention comes from the paradox of scene (or event) classification. In absence of any a-priori information, the scene classification task requires the knowledge of regions and objects contained in the image. On the other hand, it is increasingly being recognized in vision community that context information is necessary for reliable extraction of knowledge of the image regions and objects.
It would be useful to be able to represent the semantic classification of each pixel in an image. A deterministic approach would entail the reclassification of image regions from the beginning, and it is not very clear how one would be able to encode the context information efficiently in a deterministic framework. However, instead of employing a deterministic model and, e.g., assigning each pixel to one of the classes in a recognition vocabulary, a probabilistic framework would seem to offer more promise. What is needed is a technique that would effectively generate a class probability over the input image, which would represent the probability of each pixel having come from a given class.