The diversity and versatility of print and display devices imposes demands on designers of multimedia content for rendering and viewing. For instance, designers must provide different alternatives for web-content and design different layouts for different rendering applications and devices, ranging from tiny “thumbprints” of images—often seen in selection menus; small, low resolution mobile telephone screens; and slightly larger PDA screens—to large images—often seen in high resolution, elongated, flat panel displays and projector screens. Adapting images to different rendering applications and devices than originally intended is called image retargeting.
Conventional image retargeting typically involves scaling and cropping. Image scaling is insufficient because it ignores the image content and typically can only be applied uniformly. Scaling also does not work well when the aspect ratio of the image needs to change because it introduces visual distortions. Cropping is limited because it can only remove pixels from the image periphery. More effective resizing can only be achieved by considering the image content as a whole, in conjunction with geometric constraints of the output device.
Image resizing is an alternative tool for image retargeting. Image resizing works by uniformly resizing a source image to a size of a target display. While resizing an image, there is a desire to change the size of the image while maintaining important features in the content of the image. This can be done with top-down or bottom-up methods. Top-down methods use tools such as face detectors to detect important regions in the image, whereas bottom-up methods rely on visual saliency methods to construct a visual saliency map of the source image. After the saliency map is constructed, cropping can be used to display the most important region of the image.
One method automatically generates thumbnail images based on either a saliency map or the output of a face detector, Suh et al., “Automatic thumbnail cropping and its effectiveness,” UIST '03: Proceedings of the 16th annual ACM symposium on user interface software and technology, ACM Press, New York, N.Y., USA, 95-104, 2003. A source image is cropped to capture the most salient region in the image. Another method adapts images to mobile devices, Chen et al., “A visual attention model for adapting images on small displays,” Multimedia Systems, vol. 9, no. 4, 353-364, 2003. In that method, the most important region in the image is automatically detected and transmitted to the mobile device.
Another method use eye tracking, in addition to composition rules, to crop images intelligently, Santella et al., “Gaze-based interaction for semiautomatic photo cropping,” Proceedings of the SIGCHI conference on human factors in computing systems, 771-780, 2006, incorporated by reference. In that method, a user looks at an image while eye movements are recorded. The recordings are used to identify important image content and can then automatically generate crops of any size or aspect ratio.
All of the above rely on conventional image resizing and cropping operations to retarget the image.
Another method uses an adaptive grid-based document layout that maintains a clear separation between image content and a template, Jacobs et al., “Adaptive grid-based document layout,” Proceedings of ACM SIGGRAPH, 838-847, 2003. A designer constructs several possible templates. When the content is displayed, the most suitable template is used.
A compromise between image resizing and image cropping is to use non-linear, data-dependent scaling for image retargeting, Liu et al., “Automatic Image Retargeting with Fisheye-View Warping,” ACM UIST, 153-162, 2005. They use image information, such as low-level salience and high-level object recognition, to find important regions in the source image. Then, they apply a non-linear image warping function to emphasize important aspects of the image while retaining the surrounding context.
Another method uses an automatic, non-photorealistic method for retargeting large images to small size displays, Setlur et al., “Automatic Image Retargeting,” Proceedings of the 18th annual ACM symposium on user interface software and technology, 153-162, ACM Press, 2005. They decompose the image into a background layer and foreground objects. Their retargeting method segments an image into regions, identifies important regions, removes them, fills in the resulting gaps, resizes the remaining image, and re-inserts the important regions.
Another method uses a feature-aware texture mapping that warps an image to a new shape, while preserving user-specified regions, Gal et al., “Feature aware texturing,” Eurographics Symposium on Rendering, 2006. They solve a particular formulation of the Laplace editing technique suited to accommodate similarity constraints in images. However, local constraints are propagated through the entire image to accommodate all constraints at once, and may sometimes fail.
Another method composes a novel photomontage from several images, Agarwala et al., “Interactive digital photomontage,” ACM Trans. Graph. 23, 3, 294-302, 2004. A user selects ROIs from different input images, which are then composited into an output image. Another method uses drag-and-drop pasting, Jia et al., “Drag-and-drop pasting,” Proceedings of SIGGRAPH, 2006. They determine an optimal boundary between the source and target images. Another method generates a collage image from a collection of images, Rother et al., “Autocollage,” Proceedings of SIGGRAPH, 2006. None of these compositing methods address image retargeting.
Another method simultaneously solves matting and compositing, Wang et al., “Simultaneous Matting and Compositing,” Microsoft Research Technical Report, MSR-TR-2006-63, May 2006. They allow the user to scale the size of a foreground object and paste the object back onto the original background.
Another method “stitches” images together based on a cost functions, Zomet et al., “Seamless image stitching by minimizing false edges,” IEEE Transactions on Image Processing 15, 4, 969-977, 2005. They minimize an L1 error norm between the gradients of a stitched image and the gradients of the input images.
Changing the appearance of an image has been extensively described in the field of texture synthesis, where the goal is to generate an output image that has different texture than an input image, while preserving the basic idea of the content, U.S. Pat. No. 6,919,903, “Texture synthesis and transfer for pixel images,” issued to Freeman et al. on Jul. 19, 2005. That method does not consider image retargeting.
Another method performs object removal, Bertalmio et al., “Simultaneous structure and texture image inpainting,” Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 707-714, 2000. They use image inpainting to smoothly propagate information from the boundaries inwards, simulating painting restoration.
Patch based methods, approaches use automatic guidance to determine synthesis ordering, are known: “Fragment-based image completion,” Proceedings of ACM SIGGRAPH, 303-312, 2003, and Criminisi et al., “Object removal by exemplar-based inpainting,” IEEE Conference on Computer Vision and Pattern Recognition, 417-424, 2003.
Another interactive method provides inpainting for images missing strong visual structure by propagating structure along user-specified curves, Sun et al., “Image completion with structure propagation,” Proceedings of ACM SIGGRAPH, 2005.