The proliferation of digital cameras and scanners has lead to an explosion of digital images, creating large personal image databases where it is becoming increasingly difficult to find images. In the absence of manual annotation specifying the content of the image (in the form of captions or tags), the only dimension the user can currently search along is time—which limits the search functionality severely. When the user does not remember the exact date a picture was taken, or if the user wishes to aggregate images over different time periods (e.g. images taken at Niagara Falls across many visits over the years, images of person A), he/she would have to browse through a large number of irrelevant images to extract the desired image(s). A compelling alternative is to allow searching along other dimensions. Since there are unifying themes, such as the presence of a common set of people and locations, throughout a user's image collection; people present in images and the place where the picture was taken are useful search dimensions. These dimensions can be combined to produce the exact sub-set of images that the user is looking for. The ability to retrieve photos taken at a particular location can be used for image search by capture location (e.g. find all pictures taken in my living room) as well as to narrow the search space for other searches when used in conjunction with other search dimensions such as date and people present in images (e.g. looking for the picture of a friend who attended a barbecue party in my backyard).
In the absence of Global Positioning System (GPS) data, the location the photo was taken can be described in terms of the background of the image. Images with similar backgrounds are likely to have been taken at the same location. The background could be a living room wall with a picture hanging on it, or a well-known landmark such as the Eiffel tower.
There has been significant research in the area of image segmentation where the main segments in an image are automatically detected (for example, “Fast Multiscale Image Segmentation” by Sharon et al in proceedings of IEEE Conf. on Computer Vision and Pattern Recognition, 2000), but no determination is made on whether the segments belong to the background. Segmentation into background and non-background has been demonstrated for constrained domains such as TV news broadcasts, museum images or images with smooth backgrounds. A recent work by S. Yu and J. Shi (“Segmentation Given Partial Grouping Constraints” in IEEE Transactions on Pattern Analysis and Machine Intelligence, February 2004), shows segregation of objects from the background without specific object knowledge. Detection of main subject regions is also described in commonly assigned U.S. Pat. No. 6,282,317 entitled “Method for Automatic Determination of Main Subjects in Photographic Images” by Luo et al. However, there has been no attention focused on the background of the image. The image background is not simply the image regions left when the main subject regions are eliminated; main subject regions can also be part of the background. For example, in a picture of the Eiffel Tower, the tower is the main subject region; however, it is part of the background that describes the location the picture was taken.