The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Consumer electronic devices capable of generating or displaying three-dimensional images are becoming widespread. Examples of such devices comprise, stereoscopic 3D television sets, mobile phones with stereoscopic displays, such as the HTC EVO 3D and stereoscopic digital still cameras, such as the FUJI™ FinePix Real 3D W1 3D Digital Camera.
A stereoscopic picture of a scene consists of two images, one for the left eye, and the other for the right eye, trying to mimic the effects of human binocular vision. If the two pictures are taken from slightly different points of view, then the resulting images will in turn be slightly different, and objects in both images will appear in different location in the two images. The difference between the object locations in the two images are known as disparities.
When viewing one image with one eye, and the second image with the other eye, the human brain uses these disparities, to interpret the viewed picture pair as representing a three-dimensional (3D) scene.
To offer a greater viewing comfort, usually only a horizontal disparity between objects is desired, since any vertical disparity will result in a scene that is uncomfortable to view for human subjects. Thus, ideally, stereoscopic pictures exhibit pure horizontal disparities between objects, and no vertical disparity. In such an ideal stereoscopic picture, pixels corresponding to the same physical object are therefore located on the same horizontal line in each image in the pair.
To acquire such pictures typically, stereoscopic cameras comprise a pair of cameras with a fixed baseline. In other words, the distance between the camera apertures is a fixed horizontal distance, corresponding to the distance between human eyes. This distance is approximately 6 centimeters, and FIG. 1 illustrates a pair of stereoscopic images acquired with such a fixed baseline.
Stereoscopic cameras can be mechanically aligned and mounted on a rigid support to help ensure that a purely horizontal disparity is achieved. However, such mechanical alignments are often not sufficiently precise to avoid some degree of vertical disparity also being present. To compensate for this effect, some cameras implement rectification algorithms, to further eliminate vertical disparities by warping the pictures and/or aligning the images. For all these reasons, stereoscopic cameras are typically more expensive to manufacture than single-view cameras.
An alternative method of acquiring a stereoscopic pair of pictures is to use a single camera, and let the user; acquire a first image that will become, for example, the left image; displace the camera to the right; and then acquire a second image, with the camera pointing in the same direction as for the first image. This second image will become the right image. This alternative method of acquiring a stereoscopic pair of pictures is often referred to as the “click-click” solution. Since a camera with only a single image capture device is required, the “click-click” solution is generally less expensive than solutions using a true stereoscopic camera. In addition, the “click-click” solution can also be implemented on the majority of today's consumer electronic devices such as mobile phones.
When displacing the camera, there are six available degrees of freedom. The camera can be rotated about three perpendicular axes or translated along three perpendicular directions. When displacing the camera, any combination of rotations and/or translations can be made.
Typically, for obtaining stereoscopic images, the camera should be moved in a manner that translates the camera in a direction that is perpendicular to the viewing direction of the camera (typically the horizontal direction), but that does not translate the camera in any other direction or rotate the camera orientation.
Here, we define the horizontal direction as being a direction that is parallel to a direction in the sensor of the camera, typically an edge of the sensor, rather than the actual horizontal direction as defined by gravity.
A disadvantage of the “click-click” method is that, typically, a human user holding and displacing the camera between the two image acquisitions cannot attain the desired purely horizontal disparity. As a result, the two pictures acquired by the “click-click” method may not be properly aligned in the horizontal direction and/or may not have been taken with the camera oriented in the same direction.
For example, if the camera undergoes rotation about the optical axis (roll) between the two pictures, then a two-dimensional deformation will occur between the two images. If the camera undergoes rotation about one or more of the other two rotation axes (pitch and yaw), then a three-dimensional deformation will occur between the two images due to distortions arising from a change in perspective of the scene between the two images.
The presence of one or more undesired rotation or translation between the two images will result in undesired disparities, and this will reduce the stereoscopic quality of the image pair.
As noted above, there can be errors in both the position and orientation of the camera for taking the second image. Throughout this document, the term “pose” is used to define the combination of position and orientation of a camera. Thus, “pose” defines both the spatial position of the camera with respect to an arbitrary coordinate system, and also the orientation of the viewing direction of the camera with respect to that coordinate system.
One solution to this problem is to provide the user with feedback during the image capture process to help them align the camera with the desired pose. This can be achieved by superimposing the static left image with a live right image in the viewfinder of the camera. This allows the user to adjust the camera to acquire a right image that is correctly aligned with the previously acquired left image. Such a superimposed image is illustrated in FIG. 2. This method can allow an improvement to the quality of the stereoscopic pairs of images acquired, but the resulting image can be confusing to the user and difficult to understand.
Document US 2007/0165129 A1 discloses a method for selecting a stereoscopic pair of images, for example as they are captured by a camera or from an existing collection of captured images. A first image is selected and a cursor overlaid on the first image is aligned with an image feature. The cursor is then shifted by a predetermined amount and a second image is selected such that the cursor is overlaid on the second image and is substantially aligned with the feature.
It is an object of the embodiments described herein to overcome or mitigate at least some of the above-described problems.