In recent years, attention is being paid to 3D images. As a popular technique for viewing such 3D images, there is currently a technique by which a viewer views images of two viewpoints that are alternately displayed. In viewing the images of two viewpoints, the viewer wears glasses that open the shutter for the left eye when one of the images of two viewpoints is displayed, and open the shutter for the right eye when the other one of the images is displayed (hereinafter referred to as the glasses-involving technique).
However, with such a glasses-involving technique, a viewer needs to purchase glasses as well as a 3D image display device, and this reduces the viewer's purchasing interest. Also, the need to wear glasses at the time of viewing is troublesome for a viewer. Therefore, there is an increasing demand for a viewing technique by which a viewer can view 3D images without glasses (hereinafter referred to as a glasses-free technique).
By the glasses-free technique, images of three or more viewpoints are displayed in such a manner that the viewable angle varies at the respective viewpoints, and the viewer can view a 3D image without glasses by seeing each image of any two viewpoints with the right and left eyes.
Also, 3D images compliant with the glasses-free technique are now about to be standardized as MPEG (Moving Picture Experts Group phase) 3DV. In MPEG 3DV, transmission of multi-view color images and depth images is to be standardized. A depth image is an image that indicates the depth value (depth information) of each pixel in a color image of a predetermined viewpoint, and a depth value (depth information) is a value that represents the position of the object in the depth direction.
To decode multi-view color images and depth images, a high-performance decoding device is required. For example, to decode color images and depth images of three viewpoints, a decoding device that has an ability to decode images of six viewpoints is required. At present, however, there exist only decoding devices that are capable of decoding color images of two viewpoints.
Also, to transmit multi-view color images and depth images as a baseband, wideband transmission needs to be performed. For example, to transmit color images and depth images of three viewpoints as a baseband, images of six viewpoints need to be transmitted. At present, however, there exist only devices that are capable of transmitting color images of two viewpoints.
Therefore, it is difficult to encode, transmit, and decode multi-view color images and depth images as a baseband, and it is necessary to reduce the baseband data amount.
In view of this, there is a method of reducing the baseband data amount by generating an LDV (Layered Depth Video) as a baseband. By this method, an LDV formed with a color image and a depth image of a reference viewpoint (hereinafter referred to as the reference point), and occlusion regions (described later in detail) of color images and depth images of the viewpoints other than the reference point is generated as a baseband.
Specifically, multi-view color images and depth images are images of the same scenery viewed from different viewpoints. Therefore, the regions other than the occlusion region in the multi-view color images overlap with one another, and the regions other than the occlusion region in the multi-view depth images overlap with one another. In view of this, an LDV formed with a color image and a depth image of the reference point, and the occlusion regions of color images and depth images of the viewpoints other than the reference point is generated as a baseband, instead of multi-view color images and depth images.
An occlusion region is a region that appears due to a change in positional relationship between the foreground and the background at the time of a change in viewpoint, exists in an image of a viewpoint, but does not exist in an image of another viewpoint. That is, an occlusion region is a background that is hidden by a foreground in an image of a viewpoint prior to a change, but is not hidden by the foreground in the image of the viewpoint after the change.
Non-Patent Document 1 discloses a method of generating an LDV from multi-view color images and depth images. Non-Patent Document 2 discloses a method of restoring multi-view color images and depth images from an LDV.
FIG. 1 is a block diagram showing an example structure of an image processing system that generates an LDV from multi-view color images and depth images, encodes the LDV, and decodes the LDV to restore the multi-view color images and depth images.
The image processing system 10 shown in FIG. 1 includes a format conversion device 11, a multi-view image encoding device 12, a multi-view image decoding device 13, and an inverse format conversion device 14.
Multi-view color images and depth images are input to the format conversion device 11 of the image processing system 10. The format conversion device 11 performs a baseband converting operation. Specifically, the format conversion device 11 generates an LDV as a baseband from the multi-view color images and depth images. As a result, the baseband data amount is reduced. The format conversion device 11 supplies the color image of the reference point in the LDV as the reference-point color image to the multi-view image encoding device 12, and supplies the depth image of the reference point as the reference-point depth image to the multi-view image encoding device 12.
The format conversion device 11 also multiplexes the occlusion regions of the color images of the viewpoints other than the reference point in the LDV into one screen, and supplies the resultant image as the background color image to the multi-view image encoding device 12. The format conversion device 11 multiplexes the occlusion regions of the depth images of the viewpoints other than the reference point in the LDV into one screen, and supplies the resultant image as the background depth image to the multi-view image encoding device 12.
The multi-view image encoding device 12 encodes, by the MVC (Multiview Video Coding) technique or the like, the reference-point color image, the reference-point depth image, the background color image, and the background depth image, which are supplied from the format conversion device 11. The multi-view image encoding device 12 transmits the resultant bit stream to the multi-view image decoding device 13.
The multi-view image decoding device 13 receives the bit stream transmitted from the multi-view image encoding device 12, and decodes the bit stream by a technique compatible with the MVC technique. The multi-view image decoding device 13 supplies the resultant reference-point color image, reference-point depth image, background color image, and background depth image to the inverse format conversion device 14.
The inverse format conversion device 14 performs an inverse baseband converting operation compatible with the baseband converting operation performed by the format conversion device 11, on the reference-point color image, the reference-point depth image, the background color image, and the background depth image, which are supplied from the multi-view image decoding device 13.
The inverse format conversion device 14 then outputs the color images of the viewpoints other than the reference point and the reference-point color image, which are obtained as a result of the inverse baseband converting operation, as multi-view color images. The inverse format conversion device 14 also outputs the depth images of the viewpoints other than the reference point and the reference-point depth image, which are obtained as a result of the inverse baseband converting operation, as multi-view depth images.
FIG. 2 is a block diagram showing an example structure of the format conversion device 11 shown in FIG. 1.
In the following example, there are three viewpoints, one of the two viewpoints other than the reference point is the left viewpoint, and the other one of the two viewpoints is the right viewpoint.
The format conversion device 11 shown in FIG. 2 includes a warping unit 21, an occlusion determining unit 22, a warping unit 23, an occlusion determining unit 24, a screen multiplexing unit 25, and an output unit 26.
The warping unit 21 of the format conversion device 11 performs a background-prioritized warping operation toward the reference point, on the left depth image that is the depth image of the left viewpoint among the multi-view depth images. The resultant reference-point depth image is set as the left depth image of the reference point.
A warping operation is an operation to geometrically transform an image of a certain viewpoint into an image of another viewpoint. A background-prioritized warping operation is an operation to select and associate pixels that are located on the background side in the depth direction and correspond to the object among pixels that belong to an image yet to be subjected to a warping operation and are associated with the same pixel in the image subjected to the warping operation. Here, the pixels that are located on the background side in the depth direction and correspond to the object are the pixels having the smaller depth values.
Using the left depth image of the reference point, the warping unit 21 also performs a background-prioritized warping operation toward the reference point, on the left color image that is the color image of the left viewpoint among the multi-view color images. The resultant reference-point color image is set as the left color image of the reference point. The warping unit 21 then supplies the left depth image of the reference point and the left color image of the reference point to the occlusion determining unit 22.
Based on the left depth image of the reference point supplied from the warping unit 21 and the reference-point depth image among the multi-view color images that are input from outside, the occlusion determining unit 22 detects an occlusion region that appears when the viewpoint is converted from the reference point to the left viewpoint (hereinafter referred to as the left-viewpoint occlusion region). Specifically, the occlusion determining unit 22 detects the left-viewpoint occlusion region that is the region formed with pixels with which the value obtained by subtracting the depth value of the left depth image of the reference point from the depth value of the reference-point depth image is equal to or greater than a predetermined value.
That is, the warping unit 21 performs a background-prioritized warping operation on the left depth image. Accordingly, the left-viewpoint occlusion region that is the foreground in the reference-depth image in the left depth image of the reference point but is the background in the left depth image is the depth value of the background. Meanwhile, the left-viewpoint occlusion region of the reference-point depth image is the depth value of the foreground. Accordingly, the pixels with which the value obtained by subtracting the depth value of the left depth image of the reference point from the depth value of the reference-point depth image is equal to or greater than the minimum value that is assumed to be the difference between the depth value of the background and the depth value of the foreground are regarded as the pixels in the left-viewpoint occlusion region.
The occlusion determining unit 22 supplies the left-viewpoint occlusion region of the left color image of the reference point as the left color occlusion image to the screen multiplexing unit 25. The occlusion determining unit 22 also supplies the left-viewpoint occlusion region of the left depth image of the reference point as the left depth occlusion image to the screen multiplexing unit 25.
The warping unit 23 performs background-prioritized warping operations toward the reference point, on the right color image and the right depth image among the multi-view color images, like the warping unit 21. The warping unit 23 sets the resultant reference-point color image as the right color image of the reference point, and the reference-point depth image as the right depth image of the reference point, and supplies the right color image of the reference point and the right depth image of the reference point to the occlusion determining unit 24.
Based on the right depth image of the reference point and the reference-point depth image supplied from the warping unit 23, the occlusion determining unit 24 detects, like the occlusion determining unit 22, an occlusion region that appears when the viewpoint is converted from the reference point to the right viewpoint (hereinafter referred to as the right-viewpoint occlusion region).
The occlusion determining unit 24 supplies the right-viewpoint occlusion region of the right color image of the reference point as the right color occlusion image to the screen multiplexing unit 25. The occlusion determining unit 24 supplies the right-viewpoint occlusion region of the right depth image of the reference point as the right depth occlusion image to the screen multiplexing unit 25.
The screen multiplexing unit 25 multiplexes the left color occlusion image supplied from the occlusion determining unit 22 and the right color occlusion image supplied from the occlusion determining unit 24 into one screen. Specifically, the screen multiplexing unit 25 sets the pixel values of the pixels that have pixel values only in one of the left color occlusion image and the right color occlusion image, as the pixel values of the one occlusion image. The screen multiplexing unit 25 also sets the pixel values of the pixels that have pixel values in both the left color occlusion image and the right color occlusion image, as the pixel values of one of the occlusion images. The screen multiplexing unit 25 supplies the multiplexed image of the left color occlusion image and the right color occlusion image as the background color image to the multi-view image encoding device 12 (FIG. 1).
Likewise, the screen multiplexing unit 25 multiplexes the left depth occlusion image supplied from the occlusion determining unit 22 and the right depth occlusion image supplied from the occlusion determining unit 24 into one screen. The screen multiplexing unit 25 supplies the resultant multiplexed image as the background depth image to the multi-view image encoding device 12.
The output unit 26 supplies the reference-point color image and the reference-point depth image among the multi-view color images that are input from outside, to the multi-view image encoding device 12.
FIG. 3 is a block diagram showing an example structure of the inverse format conversion device 14 shown in FIG. 1.
The inverse format conversion device 14 shown in FIG. 3 includes a warping unit 31, a screen combining unit 32, a warping unit 33, a screen combining unit 34, and an output unit 35.
The warping unit 31 of the inverse format conversion device 14 receives the reference-point depth image and the reference-point depth image supplied from the multi-view image decoding device 13. The warping unit 31 functions as the first non-reference viewpoint warping unit that performs a foreground-prioritized warping operation toward the left viewpoint on the reference-point depth image, and sets the resultant left-viewpoint depth image as the reference-point depth image of the left viewpoint.
A foreground-prioritized warping operation is an operation to select and associate pixels that are located on the foreground side in the depth direction and correspond to the object among pixels that belong to an image yet to be subjected to a warping operation and are associated with the same pixel in the image subjected to the warping operation. Here, the pixels that are located on the foreground side in the depth direction and correspond to the object are the pixels having the larger depth values.
The foreground-prioritized operation is a conventional warping operation. This is because, when pixels in an image yet to be subjected to a warping operation are associated with the same pixel in the image subjected to the warping operation, the pixels are such pixels that the foreground is overlapped on the background and hides the background.
Using the reference-point depth image of the left viewpoint, the warping unit 31 performs a foreground-prioritized warping operation toward the left viewpoint on the reference-point color image supplied from the multi-view image decoding device 13. The resultant left-viewpoint color image is set as the reference-point color image of the left viewpoint. The warping unit 31 then supplies the reference-point color image of the left viewpoint and the reference-point depth image of the left viewpoint to the screen combining unit 32.
Likewise, the warping unit 31 performs foreground-prioritized warping operations toward the right viewpoint on the reference-point color image and the reference-point depth image supplied from the multi-view image decoding device 13. The warping unit 31 sets the right-viewpoint color image obtained as a result of the warping operation as the reference-point color image of the right viewpoint, and the right-viewpoint depth image as the reference-point depth image of the right viewpoint, and supplies those images to the screen combining unit 34.
The screen combining unit 32 combines the reference-point color image of the left viewpoint supplied from the warping unit 31 with the background color image of the left viewpoint supplied from the warping unit 33. Specifically, the screen combining unit 32 sets the pixel values of the pixels that have pixel values only in one of the reference-point color image of the left viewpoint and the background color image of the left viewpoint, as the pixel values of the one of the color images. The screen combining unit 32 also sets the pixel values of the pixels that have pixel values in both the reference-point color image of the left viewpoint and the background color image of the left viewpoint, as the pixel values of the reference-point color image of the left viewpoint. This is because the position of the object corresponding to the reference-point color image in the depth direction is always on the foreground side of the position of the object corresponding to the background color image in the depth direction.
Likewise, the screen combining unit 32 combines the reference-point depth image of the left viewpoint supplied from the warping unit 31 with the background depth image of the left viewpoint supplied from the warping unit 33. The screen combining unit 32 outputs the color image obtained as a result of the combining as the left color image, and the depth image as the left depth image.
Like the warping unit 31, the warping unit 33 performs foreground-prioritized warping operations toward the left viewpoint and the right viewpoint, on the background color image and the background depth image supplied from the multi-view image decoding device 13. The warping unit 33 sets the left-viewpoint color image obtained as a result of the warping operation as the background color image of the left viewpoint, and the left-viewpoint depth image as the background depth image of the left viewpoint, and supplies those images to the screen combining unit 32. The warping unit 33 also sets the right-viewpoint color image obtained as a result of the warping operation as the background color image of the right viewpoint, and the right-viewpoint depth image as the background depth image of the right viewpoint, and supplies those images to the screen combining unit 34.
Like the screen combining unit 32, the screen combining unit 34 combines the reference-point color image of the right viewpoint supplied from the warping unit 31 with the background color image of the right viewpoint supplied from the warping unit 33. Likewise, the screen combining unit 34 combines the reference-point depth image of the right viewpoint supplied from the warping unit 31 with the background depth image of the right viewpoint supplied from the warping unit 33. The screen combining unit 32 outputs the color image obtained as a result of the combining as the right color image, and the depth image as the right depth image.
The output unit 35 outputs the reference-point color image and the reference-point depth image supplied from multi-view image decoding device 13.