The exemplary embodiment relates to a system and method for selection of a representative subset of images from a collection. It finds particular application in connection with the generation of digital photo albums and will be described with particular reference thereto.
There is a growing market for digital photo albums that are printed as photo books by photofinishing companies. These are assembled collections of photographs in hardcopy form that are customized for displaying a user's photographs. When creating photo books from image collections, users can select photographs for creating the photo book as well as layouts, backgrounds, and so forth.
The first step towards the creation of a photo album is the selection of images to include in the album. This is typically performed manually by evaluating how representative each photo is as well as considering interesting characteristics, image quality, and so forth. However, the selection process can be very time-consuming for the client. As a consequence, photo books started online are often never completed and only a small proportion of the albums that the client would like to print are eventually printed.
It would be advantageous to have a method which performs this selection automatically. However, this is a complex, multifaceted problem because it involves aspects related to storytelling, multimedia analysis and retrieval, computer graphics and graphic design. Different clients also have different tastes and opinions concerning quality, colors, graphic solutions and interest level of a picture. The consequence is that many of the heuristics used by humans for picking images are extremely difficult to model. This is also true for the lack of data, level of noise, and complex dependencies between preference and semantic information.
Some success has, however, been achieved with modeling high-level image quality and interestingness. See, R. Datta et al., “Acquine: aesthetic quality inference engine—real-time automatic rating of photo aesthetics,” MIR, 2010; S. Dhar, et al., “High level describable attributes for predicting aesthetics and interestingness,” CVPR IEEE, pp. 1657-1664, 2011; R. Datta et al., “Image retrieval: Ideas, influences, and trends of the new age,” ACM Computing Surveys (CSUR), 40(2), 2008; and P. Isola, et al., “What makes an image memorable?” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 145-152, 2011.
Representativeness has been widely studied in video analysis. Many summarization techniques are available to synthesize a video by its key-frames See, A. D. Doulamis, et al., “A fuzzy video content representation for video summarization and content-based retrieval,” Signal Processing, 80(6):1049-1067, 2000; U.S. Pat. No. 6,535,639, entitled AUTOMATIC VIDEO SUMMARIZATION USING A MEASURE OF SHOT IMPORTANCE AND A FRAME-PACKING METHOD, by S. Uchihachi, et al.; C. W. Ngo, et al., “Video summarization and scene detection by graph modeling,” IEEE Transactions on Circuits and Systems for Video Technology, 15(2):296-305, 2005; and Y. F. Ma, et al., “A generic framework of user attention model and its application in video summarization,” IEEE Trans. on Multimedia, 7(5):907-919, 2005.
There has been little consideration of the case of image collections, however. The main approaches are designed to deal with specific content (e.g., landmarks). See, X. Li, et al., “Modeling and recognition of landmark image collections using iconic scene graphs,” ECCV, vol. 8, 2008. Unfortunately, these approaches tend to use obsolete techniques or are not specifically designed for photo album applications. See, J. Li, et al., “Automatic summarization for personal digital photos,” Information, Communications and Signal Processing, 2003.
There remains a need for an automated system and method which more closely approximate the selection process employed by users in choosing images from a collection.