The provision of three-dimensional viewing devices such as 3D televisions is on the rise. Such devices tend to generally fall into one of two categories. The first category is that of stereoscopic devices, which allow the user to perceive a three-dimensional image by wearing special glasses. The glasses that the user wears ensure that each eye sees a different image, which are slightly different views of the same scene with the viewpoint spaced apart by a short distance, mimicking the distance that the user's eyes are spaced apart. The user's brain processes these two images to create a three-dimensional image. The second category of display devices comprises auto-stereoscopic display devices, which produce a three-dimensional effect, without the user having to wear any special glasses. These devices work by projecting multiple views from the display, which again ensures that each eye of the user sees a different image. One way that this can be achieved is for a lenticular optical array to be overlaid on a conventional display device, to produce the multiple views of the image.
The availability of content for use by auto-stereoscopic display devices is important for the market acceptance thereof. In order for an auto-stereoscopic display device to produce a satisfactory video output, a depth map needs to be available for each frame in the video. The depth map and the original frame are used to generate the multiple views required by an auto-stereoscopic display. As video sequences comprise many frames per second, e.g. the PAL scheme has a frame rate of 25 frames per second, production of the required depth maps for image sequences is not a trivial task.
The availability of high quality 3D video is important for the success of 3D television. At present both three-dimensional video capture as well as conversion of existing or newly acquired 2D video to 3D video by adding depth is being investigated by various companies. A well known conversion method is to assign depth at key-frames using manual input via a graphical editor and then to automatically propagate this depth information for the duration of a video shot (typically a few seconds). Automatic propagation is maturing rapidly but manual assignment of depth at key-frames (using an editor) is still slow (the user typically draws polygons for which a depth profile is specified) and therefore costly.
Assigning depth at key-frames is currently often done using common graphical editors. Very often only a polygon drawing tool is used to select regions to which a constant depth or a depth profile is assigned. This process is slow since the user must position the cursor close to the object contour using the mouse. Typically many mouse-clicks are needed to accurately align the curve with the object.
In another field of image processing, automatic segmentation is proposed to aid selection “Understanding Synthetic Aperture Radar Images”, C. Oliver, S. Quegan, Artech-House, 1998. This selection method may be used to partition an image into square regions and then automatically align region edges with object boundaries using region fitting. This technique is very similar to clustering of data shown in “Pattern Classification”, Richard O. Duda, Peter E. Hart, and David G. Stork, John Wiley and Sons, Inc., New York, 2001, but with the difference that it incorporates boundary regularity in a global optimization criterion. This makes it easier to avoid irregular edges due to noise.
Selecting regions with the cursor and then assigning depth to a region is an obvious way in which an automatically generated segmentation can help to produce a depth map. However selecting the number of regions in the segmentation is difficult. To avoid missing important contours, many small regions are needed. On the other hand, large regions allow faster depth assignment. As selecting the best possible segmentation remains an issue manual demarcation of selections is still widely used.