Image assessment and understanding deal with problems that are easily solved by human beings given their intellectual faculties but are extremely difficult to solve by fully automated computer systems. Image understanding problems that are considered important in photographic applications include main subject detection, scene classification, sky and grass detection, people detection, automatic detection of orientation, etc. In a variety of applications that deal with a group of pictures, it is important to rank the images in terms of a logical order, so that they can be processed or treated according to their order. The basic notion of ranking is expressed in co-pending, commonly-assigned U.S. patent application Ser. No. 09/460,759, entitled “Method for automatic assessment of emphasis and appeal in consumer images,” and which was filed 14 Dec. 1999 in the names of A. Savakis and S. Etz (which was also published as European Patent Application EP 1109132A2 on 20 Jun. 2001). According to this patent application, an image is automatically assessed with respect to certain features, wherein the assessment is a determination of the degree of importance, interest or attractiveness of the image. Feature quantities are processed with a reasoning algorithm, in particular a Bayesian network, that is trained on the opinions of one or more human observers and an output is obtained from the reasoning algorithm that assesses the image. A score is provided which, for a group of images, selects one image as the emphasis image.
A specific photographic application of interest is selecting one or more images from a collection of images that best represent the collection. This involves clustering the images into separate events and then selecting from images of each event the image that provides a viewer of the collection the best indication of the type of images in the collection. This is similar to selecting a cover image for an album, as the image provides the reader with a quick indicator of the images likely to be found in the album.
Another situation where the ranking of images in a collection is useful is when a fixed or limited amount of digital storage space is available and allocation of resources is important. Typically, digital imaging systems that store groups of images in a fixed storage space apply the same level of compression to all images in the group. This may be the situation for images stored in digital cameras, portable disks, etc. However, this approach does not take into consideration differences in emphasis or appeal between images. It is often desirable to maintain the visual quality of images that are appealing, while it is tolerable to degrade the visual quality of images that are not appealing. Therefore, it is desirable to obtain a digital system that first ranks images in terms of their relative value or appeal and then subsequently uses the results of this ranking to allocate the compression rates applied to each image. The goal is to allocate more storage to images with higher value. (See, e.g., co-pending, commonly assigned U.S. patent application Ser. No. 09/911,299, entitled “System and method for controlling image compression based on image emphasis” which was filed on 23 Jul. 2001 in the names of A. Savakis, M. Rabbani and S. Etz, and also published as European Patent Application EP 1280107A2 on 29 Jan. 2003.)
Using a small subset of the images in a collection to represent the collection is a common technique. A collection of images of a particular geographic region will likely have a cover image of a landmark that is generally identified with the location. For instance, an album of images of Paris will often have the Eiffel Tower contained in the cover image. A person looking at the cover will quickly surmise that the material in the collection is in some way linked to Paris or to France. This concept has been adopted for collections of digital images from consumer photographs, e.g., when automatically constructing an album or when selecting an image to put inside the “jewel-case” of a PictureCD®. Approaches to providing this have been to search an image collection to identify images that have photographic appeal. This approach is described in co-pending, commonly-assigned U.S. patent application Ser. No. 09/863,570, entitled “Retrieval and browsing of database images based on image emphasis and appeal,” which was filed 21 May 2001 in the names of A. Savakis and R. Mehrotra.
Often photographs contain images of people of importance to the photographer. Events and places are typically recalled by identifying the people within a collection of images. Human appearance changes during the lifetime course of aging. However, we are extremely adept at estimating age and appearances of individuals as they age. More importantly, we are capable of identifying a person in an image, even though many years of aging may have occurred between the time when the picture was taken and when the image is viewed. Experience has shown that photographs taken by many amateur photographers have people in 75% of the images. In selecting an image to represent a collection, it would seem reasonable to select an image that contains people. Using this image, a viewer could quickly surmise the events, time and occasion the collection represents. The use of face detection also provides an automatic means to estimate the number of people present in an image. An image with a large number of people many times is typical of a group shot of the people of interest and such an image is consequently a candidate for the emphasis image.
The ability to detect faces and people within images is an aspect of computer vision that has become increasing sophisticated, and well known to those of ordinary skill in this art, to the point where over 90% of the faces within a typical image can be detected. Moreover, the ability to subsequently match faces to an individual, that is, face recognition, has also become more sophisticated to the point of being applied in many security and access control situations. There is an extensive research literature on means of accomplishing the task of face recognition, which is well known to those of ordinary skill in this art. Many of these references can be readily found in the literature or on the Internet, e.g., see Volker Blanz and Thomas Vetter, “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Transaction on PAMI, Vol. 25, No. 9, Sep. 2003. While face recognition normally is considered as a means of security or access control, the technology can be applied to situations where the identity of the person is not important. Rather, the presence of an individual within a series of images has value. The use of face recognition for this purpose has been applied to forming an indexing scheme for image libraries (see co-pending, commonly-assigned U.S. patent application Ser. No. 10/143,272, “Method and apparatus for organizing and retrieving images containing human faces”, which was filed on 10 May 2002 in the names of L. Chen and M. Das). The use of current face recognition technology for these applications has an advantage relative to the access control applications, as the result gracefully degrades. Within an image collection, if there is one person that appears most frequently, then one of the images with that person present is a good candidate for selection as the emphasis image.
Another common approach used by many photographers is to include some images with signage, such as an image with a national park entrance sign identifying the name of the national park, and then to use one or more of the signage images to represent the collection. The signage is a “memory jogger” for the photographer to help recall the time and place where the images were captured. This is especially true of many vacation and holiday image collections. The purpose of these images is not for the photographic appeal, but rather as a pictorial annotation of the image collection. The annotation is provided by words on the signage, which generally provide a textual description related to subsequent images in the image collection.
The ability to locate and segment text in images has been used most often as a means to improve reproductions of images that are a combination of textual and pictorial content. An example of this is in rendering algorithms for desk-top printers, such as an inkjet printer attached to a personal computer. The methods used to reproduce superior text, when applied to pictorial content, usually result in inferior results. A similar result occurs when methods that produce superior pictorial reproductions usually result in inferior reproduction of text. Many algorithms have been proposed to overcome this problem, including commonly-assigned U.S. Pat. No. 6,393,150, entitled “Region-based image binarization system” and which issued 21 May 2002 to Lee, et. al. and “Automatic Text Location in Images and Video Frames,” A.K. Jain, in Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076. Other methods follow the text segmentation with an optical character recognition algorithm in order to extract the textual content (see J. Ohya, et. al., “Recognizing Characters in Scene Images,” IEEE Transactions on PAMI, Vol. 16, No.2, pp. 214-220.
The ability for a computer to search a region and extract text in the region into an internal computer representation of the text, e.g., ASCII code, is well established and these techniques are included in many optical scanning systems. Once the text has been converted into a computer useable format, then it is possible to search for key words, which is a well-established technology.
In view of the interest in properly and concisely characterizing the content of a collection of images, what is needed is an automatic technique utilizing content recognition of, e.g., faces or signage in order to select an image that represents the main content of the collection of images and that can be used as the emphasis image for the collection.