In recent years, movies using 3D videos have been attracting attention. The 3D video makes use of binocular parallax that occurs because the human eyes are apart approximately 65 mm to the left and right and videos sensed by the left eye and the right eye are different. The left eye and the right eye of the human respectively view different 2D videos. Therefore, when these 2D videos are transmitted from the ganglion cells on the retina surfaces to the brain center through the optic nerve, the brain subjects these 2D videos to fusion processing and recognizes the 2D videos as a stereoscopic video.
A stereoscopic video technique known as a 3D video technique is a technique for dividing two kinds of 2D videos recorded by two camera lenses into a video for the left eye and a video for the right eye and providing the videos to the left eye and the right eye of the human to thereby represent a cubic effect. However, there is a problem in that a stereo camera mounted with two camera lenses is extremely expensive, there are extremely many matters that should be considered in order to embody a high-quality 3D video such as a system for arraying the stereo camera, the distance between the cameras, a system for adjusting an angle and a focus, a geometrical problem due to the camera array, and work for matching color senses, brightness, and the like, and the stereo camera is complicated. Therefore, in general, a method of converting a 2D video into a 3D video is used rather than creating a 3D video from the beginning.
A 3D video can be generated by moving only binocular parallax equivalent to predetermined depth information with respect to objects of an original 2D video. That is, to convert the 2D video into a 3D video, a process for generating a depth map in the 2D video is necessary. The depth map serves as a map indicating a three-dimensional distance to an object in the 2D video and can be represented as a gray scale value between 0 and 255 for each pixel. As a value of depth has a larger value (meaning a brighter color), the value of depth indicates a closer distance from a position where a video is viewed. In this way, the 3D video is generated using the 2D video and the depth information. Therefore, to create a high-quality 3D video from the 2D video, it is necessary to accurately generate the depth map. In the generation of the accurate depth map, a relation between objects and background in image frames forming the 2D video, the positions of the objects, overlap among the objects, volumes of the objects, and the like should be comprehensively considered. Therefore, generation work for the accurate depth map is work in which an expert engineer divides, generally in pixel units, along contours of the objects and contours of predetermined regions in the objects, a region desired to be made three-dimensional while visually checking overlap of the individual objects and overlap of the objects and the backgrounds.
Incidentally, besides the manual work explained above, a watershed algorithm is known as one of region dividing methods used for extracting a target region in an image. This algorithm is a method of regarding gray scale information (brightness, etc.) as height in terms of geographical features and dividing the image such that, when water is filled in the geographical features, a boundary is formed between the water accumulated in a pit and the water accumulated in another pit different from the pit. It is also possible to divide objects in frames (images) forming the 2D video into a large number of regions using such an algorithm (see, for example, International Publication No. WO 2006/080239 and Japanese Patent Application Laid-Open No. 2008-277932).