Recently, a stereoscopically visible image which can respond to so-called 3D (three dimensions) (hereinafter, referred to as 3D image) is becoming widespread. Movies that use stereoscopically visible 3D images are actively produced for example. The 3D images make a big difference from conventional so-called 2D (two dimensions) images which can give only a planar view. Therefore, it is a very useful to use such a 3D image in filmmaking.
Moreover, video contents (for example, a movie and the like) that use the 3D images are compressed with high efficiency in accordance with the MVC (Multi-view Video Coding) standard that is the extended format of MPEG-4 AVC like video contents of 2D images, and then recorded, for example, in a Blu-Ray Disc or the like or distributed over a network. Moreover, development of household appliances that can respond to reproduction of the 3D images has been started. That is, an environment in which the 3D images can be enjoyed even within households has been being built.
The most widespread stereoscopic vision at this moment is a stereo image which uses binocular parallax of human eyes. This system has a mechanism in which the user feels the parallax while separately seeing video contents for left eye and video contents for right eye, thereby stereoscopically perceiving an object within an image.
However, regarding the stereoscopic vision using binocular parallax, an amount of parallax is generally set beforehand to realize a stereoscopic vision from any directions. In order to realize the stereoscopic vision from any directions, data information (Depth_Map) in a depth direction of each object within an image needs to be extracted from image data.
The research that automatically extracts rough depth information by using a technique which processes or analyzes image data is actively conducted (for example, Non-patent documents 1 and 2). The stereoscopic image viewed not only from two eyes but also from a plurality of free viewpoints becomes able to be generated by using such a technology, or a technology which extracts depth information of an object within images taken by a plurality of cameras, which is considered to be relatively easy.