In the field of photography, in particular digital photography, amateur photographers have little or no training on how to take photos of pleasing composition. The resulting photographs they take are often ill composed. It would be beneficial if a digital image processing algorithm could recompose the original shot such that it represented the shot that the photographer had wished he/she had taken in the first place. Furthermore, even if the photographer captured a pleasing composition, it is often desired to display or print that photograph with a differing aspect ratio. This is typically accomplished by digitally cropping the digital photograph. For example, many consumer digital cameras have a 4:3 aspect ratio, while many new televisions have a 16:9 aspect ratio. The task of indiscriminately trimming (without regard to content) the 4:3 aspect ratio to a 16:9 aspect ratio often eliminates image content at the top and bottom of an image and so can cut off faces of persons in the image or otherwise obscures portions of the main subject in the image. It is currently common to capture imagery using a smart phone. By holding the camera in a landscape or a portrait orientation, the aspect ratio of the captured picture can vary quite a bit. Further, after sharing this photo with a friend, upon opening up the image on the friend's computer, or another device, the aspect ratio of the display device or of the displayed photo will often be different yet again. Further, uploading the image to a social website may crop the image in an undesirable fashion yet again. All of these examples illustrate cases that could benefit from the invention described herein.
Several main subject detection algorithms have been programmed to extract what is determined to be the main subject of a still digital image. For example, U.S. Pat. No. 6,282,317 describes a method to automatically segment a digital image into regions and create a belief map corresponding to the importance of each pixel in the image. Main subject areas have the highest values in the belief map. Using this belief map, a more pleasing composition, or a preferred re-composition into a different aspect ratio of the input image is often attainable. However, despite using complex rules and sophisticated learning techniques, the main subject is often mislabeled and the computational complexity of the algorithm is generally quite significant.
It is desirable to create both a more robust, and a less compute intensive algorithm for generating aesthetically pleasing compositions of digital images. In consumer photography, surveys have shown that the human face is by far the most important element to consumers. Face detection algorithms have become ubiquitous in digital cameras and PCs, with speeds less than 50 ms on typical PCs. Several main subject detection algorithms capitalize on this, and often treat human face areas as high priority areas. For example, U.S. Pat. No. 6,940,545 describes an automatic face detection algorithm and then further describes how the size and location of said faces might feed measured variables into an auto zoom crop algorithm. U.S. Pat. No. 7,317,815 describes the benefits of using face detection information not only for cropping, but for focus, tone scaling, structure, and noise. When face detection information is bundled with existing main subject detection algorithms, the resulting beneficial performance is increased. Unfortunately, although this improvement has resulted in more pleasing contributions overall, it fails to recognize that human faces are much more important than other image components. As a result, these algorithms do not adequately incorporate face information and, instead, emphasize other main subject predictors. For baseline instantiations, face information could be limited to facial size and location, but for superior performance face information can be expanded to include facial pose, blink, eye gaze, gesture, exposure, sharpness, and subject interrelationships. If no faces are found in an image, or if found faces are deemed irrelevant, only then is reverting back to a main subject detection algorithm a good strategy for arranging aesthetically pleasing compositions.
What is needed are methods and apparatuses that will automatically convert complex digital facial information into a pleasing composition. Efficient algorithms designed to accomplish these goals will result in more robust performance at a lower CPU cost.