Recently, movies supporting stereoscopy (also called 3D image display) have been proliferating. For example, movies capable of 3D display (also called 3D movies) have come to be actively produced and becoming largely differentiated from movies only capable of traditional 2D display (also called 2D movies). In the case of a 3D movie, the content (image data, etc.) is efficiently compressed according to the MVC (Multiview Video Coding) amendment to the MPEG-4 AVC format and recorded to a recording medium such as a Blu-ray Disc (registered trademark).
A user is able to enjoy 3D movies at home by playing back the content with a consumer Blu-ray Disc player and viewing it while wearing stereoscopic glasses (also called 3D glasses).
Such 3D movies and other stereoscopic content (3D content) is rapidly proliferating. The prevailing images in 3D content are stereo images that utilize the binocular parallax of human eyes. Such images cause a user to perceive parallax and perceive a subject three-dimensionally by separately showing a left-eye image and a right-eye image to a user's respective eyes.
However, with stereoscopy using binocular parallax discussed above, stereoscopy from arbitrary directions cannot be realized. Realizing this requires extracting depth data (a depth_map) from a subject image.
Research on automatically extracting rough depth information from image data using image processing/analysis technology is being vigorously conducted (see NPL 1 and NPL 2, for example). Such technology, as well as technology allowing comparatively easy extraction of depth information for subjects in an images from a plurality of cameras, provides a foothold whereby stereoscopic images can be generated from not only binocular view points but also from a plurality of free view points.
However, although there is a limit to the fundamental amount of data for binocular stereo images, totaling image data for plural view points and their depth data yields an enormous amount of data. Consequently, it becomes important to compress such data as efficiently as possible.
For example, in the case of stereo images, there are two encoded codestreams generated by encoding respective left- and right-view images. Similarly, in the case of multiview, there exist a number of encoded codestreams equal to the number of views. Thus, merging these multiple codestreams into a single encoded codestream has been considered as one method. In so doing, the coding efficiency can be improved.
Meanwhile, JPEG 2000 is an ISO international standard for still images which is not only implemented as a digital cinema standard codec, but it also broadly used for security, archiving, medical imaging, broadcasting, etc. One of JPEG 2000's abundant functions is scalability. This function divides a single encoded codestream into a plurality of streams belonging to the same category. As a result, by rearranging the progression (order) in the encoded codestream, scalability of resolution and image quality, etc. (decoded image scalability) can be realized. Consequently, it becomes possible to use codestreams in more varied applications, thus improving the convenience of codestreams.