Seldom does a photograph record what he perceives with his eyes. Often, the scene captured in a photo is quite unexpected—and disappointing—compared to what we believe we have seen. A common example is catching someone with his eyes closed: we almost never consciously perceive an eye blink, and yet, there it is in the photo—“the camera never lies.” Human's higher cognitive functions constantly mediate our perceptions so that in photography, very often, what you get is decidedly not what you perceive. “What you get,” generally speaking, is a frozen moment in time, whereas “what you perceive” is some time-and spatially-filtered version of the evolving scene.
Digital photography can be used to create photographic images that more accurately convey our subjective impressions—or go beyond them, providing visualizations or a greater degree of artistic expression. One approach is to utilize multiple photos of a scene, taken with a digital camera, in which some aspect of the scene or camera parameters varies with each photo. A film camera could also be used, but digital photography makes the process of taking large numbers of exposures particularly easy and inexpensive. In addition, it is now possible for a digital camera to take a short video clip, which would provide multiple frames sampled at a very high frame rate, e.g., 30 fps, so it is highly likely that each part in a scene, for example a person's face, can be captured desirably at one point in time. These photographs are then pieced together, via an interactive system, to create a single photograph that better conveys the photographer's subjective perception of the scene. This process is sometimes called digital photomontage, after the traditional process of combining parts of a variety of photographs to form a composite picture, known as photomontage. It is also commonly referred to as image re-composition.
The primary technical challenges of photomontage are 1) to choose the frames containing desirable records of different parts, respectively; 2) to choose good seams between parts of the various images so that they can be joined with as few visible artifacts as possible; and 3) to reduce any remaining artifacts through a process that blends the image regions. While it is possible to perform this task using photo-editing software, such as PhotoShop™, it often involves firstly labor-intensive manual outlining of various parts and secondly extensive image manipulation. Above all, this task requires good knowledge of and working experience with digital image processing. Consequently, it is unrealistic for average consumers to accomplish image re-composition whenever they want to. Professional re-touching service is not always accessible and affordable.
To this end, it is beneficial to design a software tool that is easy to use for combining multiple images to create an ideal photograph. Aseem Agarwala et al., “Interactive Digital Photomontage”, conference proceedings of 2004 ACM SIGGRAPH, describes an interactive, computer-assisted framework for combining parts of a set of photographs into a single composite picture. Their “digital photomontage” framework makes use of two techniques primarily: graph-cut optimization, to choose good seams within the constituent images so that they can be combined as seamlessly as possible; and gradient-domain fusion, a process based on Poisson equations, to further reduce any remaining visible artifacts in the composite. Also central to their framework is a suite of interactive tools that allow the user to specify a variety of high-level image objectives, either globally across the image, or locally through a painting-style interface. Image objectives are applied independently at each pixel location and generally involve a function of the pixel values (such as “maximum contrast”) drawn from that same location in the set of source images. Typically, a user applies a series of image objectives iteratively in order to create a finished composite. The power of this framework lies in its generality for a wide variety of applications, including “selective composites” (for instance, group photos in which everyone looks their best), relighting, extended depth of field, panoramic stitching, clean-plate production, stroboscopic visualization of movement, and time-lapse mosaics. Unfortunately, this tool is clearly still not designed for an average consumer to use because the user often needs to select different parts from different images very carefully. In addition, the multiple frames used for creating the composite picture need to have approximately the same background and perspective such that the algorithm can align different parts. It is not possible to combine images taken from different perspectives or locations because registration of common parts cannot work.
Consumer focus group studies show that the highest valued image re-composition feature is one that creates a picture where anyone and everyone would look their best. Given the limited domain, it is possible to utilize models of the human faces to greatly reduce the amounts of user interaction and user knowledge required to perform the image-recomposition task, and perhaps completely automate the process using a computer, or a device with a CPU embedded, such as a digital camera (including a digital still camera, a digital camcorder, and a camera phone), a desktop computer with image/video editing/management software, and an online image/video server.
Consequently, it would be desirable to design a system that is easy to use, involves reduced amount of user interaction, and provide satisfactory results using multiple frames from either still or video capture that may or may not have been taken at the same location with the same camera perspective.