I. Field
The present invention relates generally to monoscopic low-power mobile devices, such as a hand-held camera, camcorder, single-sensor cameral phone, or other single camera sensor device capable of creating real-time stereo images and videos. The present invention also relates to a method for generating real-time stereo images, a still image capturing device, and to a video image capturing device.
II. Background
Recently, enhancing the perceptual realism has become one of the major forces that drives the revolution of next generation multimedia development. The fast growing multimedia communications and entertainment markets call for 3D stereoscopic image and video technologies that cover stereo image capturing, processing, compression, delivery, and display. Some efforts on future standards, such as 3DTV and MPEG 3DAV, have been launched to fulfill such requests.
A major difference between a stereo image and a mono image is that the former provides the feel of the third dimension and the distance to objects in the scene. Human vision by nature is stereoscopic due to the binocular views seen by the left and right eyes in different perspective viewpoints. The human brain is capable of synthesizing an image with stereoscopic depth. In general, a stereoscopic camera with two sensors is required for producing a stereoscopic image or video. However, most of the current multimedia devices deployed are implemented within the monoscopic infrastructure.
In the past decades, stereoscopic image generation has been actively studied. In one study, a video sequence is analyzed and the 3D scene structure is estimated from the 2D geometry and motion activities (which is also called Structure from Motion (SfM)). This class of approaches enables conversion of recorded 2D video clips to 3D. However, the computational complexity is rather high so that it is not feasible for real-time stereo image generation. On the other hand, since SfM is a mathematically ill-posed problem, the result might contain artifacts and cause visual discomfort. Some other approaches first estimate depth information from a single-view still-image based on a set of heuristic rules according to specific applications, and then generate the stereoscopic views thereafter.
In another study, a method for extracting relative depth information from monoscopic cues, for example retinal sizes of objects, is proposed, which is useful for the auxiliary depth map generation. In a still further study, a facial feature based parametric depth map generation scheme is proposed to convert 2D head-and-shoulder images to 3D. In another proposed method for depth-map generation some steps in the approach, for example the image classification in preprocessing, are not trivial and maybe very complicated in implementation, which undermine the practicality of the proposed algorithm. In another method a real-time 2D to 3D image conversion algorithm is proposed using motion detection and region segmentation. However, the artifacts are not avoidable due to the inaccuracy of object segmentation and object depth estimation. Clearly, all the methods mentioned above consider only the captured monoscopic images. Some other approaches use an auxiliary source to help generate the stereo views. For example, a low cost auxiliary monochrome or low-resolution camera is used to capture the additional view, and then use a disparity estimation model to generate the depth map of the pixels.
In another example, a monoscopic high resolution color camera is used to capture the luminosity and chromaticity of a scene, and inexpensive flanking 3D-stereoscopic pair of low resolution monochrome “outrigger” cameras are used to augment luminosity and chromaticity with depth. The disparity maps generated from the obtained 3 views are used to synthesis the stereoscopic pairs. In a still further example, a mixed set of automatic and manual techniques are used to extract the depth map (sometimes automatic method is not reliable), and then a simple smoothing filter is used to reduce the visible artifacts of the result image.
As can be readily seen, there is a need for a low-complexity method to obtain real-time stereo images and videos by using a monoscopic mobile camera phone or other low-power monoscopic device.
There is a need for a monoscopic low-power mobile device that estimates the depth map information in a manner that avoids not only the auxiliary equipments or human-interaction used in other approaches, but also the introduced computational complexity by using SfM or depth analysis. There is a further need for a monoscopic low-power mobile device that employs a low-complexity approach to detect and estimate depth information for real-time capturing and generation of stereo video.