There are many commercial applications in which large numbers of digital images are manipulated. For example, in the emerging practice of digital photofinishing, vast numbers of film-originated images are digitized, manipulated and enhanced, and then suitably printed on photographic or inkjet paper. With the advent of digital image processing, and more recently, image understanding, it has become possible to incorporate many new kinds of value-added image enhancements. Examples include selective enhancement (e.g., sharpening, exposure compensation, noise reduction, etc.), and various kinds of image restorations (e.g., red-eye correction).
In these types of automated image enhancement scenarios, one basic piece of semantic image understanding consists of knowledge of image orientation—that is, which of the four possible image orientations represents “up” in the original scene. Film and digital cameras can capture images while being held in the nominally expected landscape orientation, or held sideways. Furthermore, in film cameras, the film may be wound left-to-right or right-to-left. Because of these freedoms, the true orientation of the images will in general not be known a priori in many processing environments. Image orientation is important for many reasons. For example when a series of images are viewed on a monitor or television set, it is aggravating if some of the images are displayed upside-down or sideways. Additionally, it is now a common practice to produce an index print showing thumbnail versions of the images in a photofinishing order. It is quite desirable that all images in the index print be printed right side up, even when the photographer rotated the camera prior to image capture. One way to accomplish such a feat is to analyze the content of the scene semantically to determine the correct image orientation. Similar needs exist for automatic albuming, which sorts images into album pages. Clearly, it is desirable to have all the pictures in their upright orientation when placed in the album.
Probably the most useful semantic indication of image orientation is the orientation of people in a scene. In most cases, when people appear in scenes, they are oriented such that their upward direction matches the image's true upward direction. Of course, there are exceptions to this statement, as for example when the subject is lying down, such as in a picture of a baby lying on a crib bed. However, examination of large databases of images captured by amateur photographers has shown that the vast majority of people are oriented up-right in images. This tendency is even stronger in images produced by professional photographers, i.e., portraits.
Another useful semantic indication of image orientation is sky. Sky appears frequently in outdoor pictures and usually at the top of these pictures. It is possible that due to picture composition, the majority of the sky region may be concentrated on the left or right side of a picture (but rarely the bottom of the picture) Therefore, it is not always reliable to state “the side of the picture in which sky area concentrates is the up-right side of the picture”.
Text and signs appear in many pictures, e.g., street scenes, shops, etc. In general, it is unlikely that signs and text are placed sideways or upside down, although mirror image or post-capture image manipulation may flip the text or signs. Detection and recognition of signs can be very useful for determining the correct image orientation, especially for documents that contain mostly text. In U.S. Pat. No. 6,151,423 issued Nov. 21, 2000, Melen disclosed a method for determining the correct orientation for a document scanned by an OCR system from the confidence factors associated with multiple character images identified in the document. Specifically, this method is applicable to a scanned page of alphanumeric characters having a plurality of alphanumeric characters. The method includes the following steps: receiving captured image data corresponding to a first orientation for a page, the first orientation corresponding to the orientation in which the page is provided to a scanner; identifying a first set of candidate character codes that correspond to characters from the page according to the first orientation; associating a confidence factor with each candidate character code from the first set of candidate character codes to produce a first set of confidence factors; producing a second set of candidate character codes that correspond to characters from the page according to a second orientation; associating a confidence factor with each candidate character code from the second set of candidate character codes to produce a second set of confidence factors; determining the number of confidence factor values in the first set of confidence factors that exceed a predetermined value; determining the number of confidence factor values in the second set of confidence factors that exceed the predetermined value; and determining that the correct page orientation is the first orientation when the number of confidence factors in the first set of confidence factors that exceeds the predetermined value is higher than the number of confidence factors in the second set of confidence factors that exceeds the predetermined value. This method was used to properly re-orient scanned documents which may not be properly oriented during scanning.
In addition to face, sky and text, other semantic objects can be identified to help decide image orientation. While semantic objects are useful for determining image orientation, they are not always present in an arbitrary image, such as a photograph. Therefore, their usefulness is limited. In addition, there can be violation of the assumption that the orientation of the semantic objects is the same as the orientation of the entire image. For example, while it is always true that the texture orientation is the same as a document composed of mostly text, it is possible that text may not be aligned with the upright direction of a photograph. Furthermore, automatic detectors of these semantic objects are not perfect and can have false positive detection (mistaking something else as the semantic object) as well as false negative detection (missing a true semantic object). Therefore, it is not reliable to rely only on semantic objects to decide the correct image orientation.
On the other hand, it is possible to recognize the correct image orientation without having to recognize any semantic object in the image. In U.S. Pat. No. 4,870,694 issued Sep. 26, 1989, Takeo teaches a method of determining the orientation of an image of a human body to determine whether the image is in the normal erect position or not. This method comprises the steps of obtaining image signals carrying the image information of the human body, obtaining the distributions of the image signal levels in the vertical direction and horizontal direction of the image, and comparing the pattern of the distribution in the vertical direction with that of the horizontal direction, whereby it is determined whether the image is in the normal position based on the comparison. This method is specifically designed for x-ray radiographs based on the characteristics of the human body in response to x-rays, as well as the fact that a fair amount of left-to-right symmetry exists in such radiographs, and a fair amount of dissimilarity exists in the vertical and horizontal directions. In addition, there is generally no background clutter in radiographs. In Comparison, clutter tends to confuse the orientation in photographs.
Vailaya et al., in “Automatic Image Orientation Detection”, Proceedings of International Conference on Image Processing, 1999, disclosed a method for automatic image orientation estimation using a learning-by-example framework. It was demonstrated that image orientation can be determined by examining the spatial lay-out, i.e., how colors and textures are distributed spatially across an image, at a fairly high accuracy, especially for stock photos shot by professional photographers who pay higher attention to image composition than average consumers. This learning by example approach performs well when the images fall into stereotypes, such as “sunset”, “desert”, “mountain”, “fields”, etc. Thousands of stereotype or prototype images are used to train a classifier which learns to recognize the upright orientation of prototype scenes. The drawback of this method is that it tends to perform poorly on consumer snapshot photos, which tend to have arbitrary scene content that does not fit the learned prototypes.
Depending on the application, prior probabilities for image orientation can vary greatly. Of course, in the absence of other information, the priors must be uniform (25%). However, in practice, the prior probability of each of the four possible orientations is not uniform. People tend to hold the camera in a fairly constant way. As a result, in general, the landscape images would mostly be properly oriented (upside-down is unlikely), and the task would be to identify and orient the portrait images. The priors in this case may be around 70%–14%–14%–2%. Thus, the accuracy of an automatic method would need to significantly exceed 70% to be useful.
It is also noteworthy that in U.S. Pat. No. 5,642,443 issued Jun. 24, 1997, Goodwin teaches how to determine the orientation of a set of recorded images. The recorded images are scanned. The scanning operation obtains information regarding at least one scene characteristic distributed asymmetrically in the separate recorded images. Probability estimates of orientation of each of the recorded images for which at least one scene characteristic is obtained are determined as a function of asymmetry in distribution of the scene characteristic. The probability of correct orientation for the set of recorded images is determined from high-probability estimates of orientation of each of the recorded images in the set. Note that Goodwin does not rely on high-probability estimates of the orientation for all images; the orientation of the whole set can be determined as long as there are enough high-probability estimates from individual images.
Semantic object-based methods suffer when selected semantic objects are not present or not detected correctly even if they are present. On the other hand, scene layout-based methods are in general not as reliable when a digital image does not fall into the types of scene layout learned in advance.
There is a need therefore for an improved method of determining the orientation of images.