Recently there has been much interest in providing 3-D images on 3-D image displays. It is believed that 3-D imaging will be, after color imaging, the next great innovation in imaging. We are now at the advent of introduction of 3D displays for the consumer market.
A 3-D display device usually has a display screen on which the images are displayed. Basically, a three-dimensional impression can be created by using stereo images, i.e. two slightly different images directed at the two eyes of the viewer. An example of such a device is an autostereoscopic display. In other devices images are sent in all directions and glasses are used to block certain images to give a 3D perception.
Whatever type of 3-D display is used, the 3-D image information has to be provided to the display device. This is usually done in the form of a 3-D image signal comprising digital data.
The generation of 3-D images is conventionally done by adding a depth map, said depth map providing information on the depth of the pixel within the image and thus providing 3D information. Using the depth map for an image a left and right image can be constructed providing a 3D image.
Recovering 3D information from images is one of the fundamental tasks relating to 3-D imaging. The most common way of computing a depth map is to use stereovision. Although much progress has been made in stereovision, the fundamental correspondences problem remains difficult in real-world applications. In particular, the ultra precise alignment requirements between the two cameras hamper cheap consumer applications.
There have been some methods proposed to extract 3-D information from a single image. One of these methods is for instance to use the “depth from defocus” principle. In this method, a variable lens is used to sweep the focal plane through the scene, and to determine at which focus position each object is most sharply observed. However, although this may work well for a single image, for video images wherein objects are moving around this becomes very difficult. Using a variable lens while at the same time recording video images with changing content constitutes a daunting task, requiring very fast variable lenses and massive computing power. Also, it cannot be used in a known camera, if it does not have a variable lens with the required speed and scope of focus variation.
Another method is to use the so-called Time-of-Flight (ToF) principle. Light is modulated and send towards the object, and the camera measures the time delay between the send and received light. As light propagates at a fixed speed c, one can measure distances with this method. 3DV Systems, Mesa Imaging and Canesta have developed cameras based on ToF technologies. However they are generally expensive and have limited spatial resolutions (e.g., 64×64 for a Canesta sensor). They also are not, or very difficult, to apply for known cameras. At short distances it becomes, due to the short time of flight, difficult to measure anything at all. Yet other systems record, using a single camera, alternatively left and right images using shutters to shut out one of the images.
Recording alternatively left and right images works fine for static objects, but has the disadvantage that for moving objects left and right images are not the same since objects have moved between taking the images. The difference in position of an object between a left and right image is then dependent on the distance to the lens of an object, but also on the movement of the object. In order to get an accurate determination of distance, an accurate motion estimation plus stereo matching has to be performed. Both parameters, distance as well as motion, are a priori unknown and also will change in time in unknown manner. It requires several frames before accurate motion estimation is possible. In circumstances, such as moving repetitive patterns or objects moving at great speed or having an erratic motion, accurate motion estimation is hardly or at all possible.
There therefore is a need for a system based on a single camera which is possible to provide 3-D information in a relatively simple manner and could be used for existing cameras and for which the above problems are reduced.