1. Field of the Invention
The present invention relates to image decoding apparatus, image decoding program, image decoding method, image encoding apparatus, image encoding program, and image encoding method.
2. Related Background Art
A well-known technology is a super resolution technique (the term “super resolution” will be referred to hereinafter as SR) of generating a high-resolution image (the term “high resolution” will be referred to hereinafter as HR) from a plurality of low-resolution images (the term “low resolution” will be referred to hereinafter as LR) reconstructed through decoding of encoded video data (e.g., “C. A. Segall et al., “High-Resolution Images from Low-Resolution Compressed Video,” IEEE Signal Processing Magazine, May 2003, pp. 37-48,” which will be referred to hereinafter as “Non-patent Document 1”).
The SR technique permits us to generate an HR image from a plurality of LR images by modeling relations between a plurality of LR images and one HR image and statistically processing known information and estimated information. FIG. 1 shows a model between LR images and an HR image. This model assumes that original LR images 104 of multiple frames (L frames) are generated from an original HR image 101. In this assumption, motion models 201-1, 201-2, . . . , 201-L are applied to the original HR image 101 to generate the original LR images 104-1, 104-2, . . . , 104-L. On this occasion, a sampling process is performed on the HR image using sampling model 202 based on low-pass filtering and down-sampling to generate the original LR images 104-1, 104-2, . . . , 104-L. Assuming that quantization noises 103-1, 103-2, . . . , 103-L represent differences between reconstructed LR images 102-1, 102-2, . . . , 102-L generated through decoding of encoded video data, and the original LR images 104-1, 104-2, . . . , 104-L, the relationship between the original HR image f_k(x,z) of frame k, where 1≦x≦2M and 1≦z≦2N, and the reconstructed LR image y_l(m,n) of frame 1, where 1≦m≦M and 1≦n≦N can be modeled by Eq. 1 below.y—l=AHC(d—lk)×f—k+e—l  (Eq. 1)In this equation, l represents an integer from 1 to L, C(d_lk) a matrix of a motion model between HR images of frame k and frame 1, AH a matrix of a sampling model (where H indicates a 4MN×4MN matrix representing a filtering process of HR image and A an MN×4MN down-sampling matrix), and e_l the quantization noise of the reconstructed LR image of frame 1.
In this manner, a certain reconstructed LR image generated from encoded video data and an HR image can be modeled by the motion model indicating the time-space correspondence between the LR and HR images, and the signal model of noise generated in the process of degradation from HR image to LR image. Therefore, an HR image can be generated from a plurality of reconstructed LR images by defining a cost function to evaluate estimates of the motion model and signal model by statistical means and by solving a nonlinear optimization process. Solutions to be obtained in this optimization process are motion information (SR motion information) representing a time-space correspondence between LR and HR images for each of the plurality of LR images, and the HR image.
One of methods of the optimization process is, for example, the coordinate-descent method (“H. He, and L. P. Kondi, “MAP Based Resolution Enhancement of Video Sequences Using a Huber-Markov Random Field Image Prior Model,” Proc. of IEEE International Conference on Image Processing Vol. II, (Spain), September 2003, pp. 933-936,” which will be referred to hereinafter as “Non-patent Document 2”). In this method, first, a virtual HR image (a provisional HR image in the optimization using iterations) is generated by interpolation from a reconstructed LR image. While the HR image is not changed, motion information representing time-space correspondences between the virtual HR image and a plurality of LR images is then determined by use of the cost function. Next, while the motion information thus determined is not changed, the virtual HR image is updated by use of the cost function. Furthermore, while the virtual HR image is not changed, the motion information is updated. This process is iterated before convergence is reached to a solution.