(1) Field of the Invention
The present invention relates to an image processing method and an image processing device that enhance the resolution of a picture by using multiple input low-resolution pictures.
(2) Description of the Related Art
Many efforts are being made to enhance the resolution of pictures captured with a digital still camera and the like to acquire clearer and crisper pictures. There is a growing interest in a method in which multiple pictures displaced with respect to each other are combined to restore high-frequency component in order to acquire a high-resolution picture closer to the original picture. According to this method, multiple pictures that are successive in time in motion video can be used to produce a high resolution one. The method is expected to find wide applications such as enhancement of the resolution of motion video captured with a video camera. The processing for generating a high-resolution picture from multiple low-resolution pictures will be hereinafter referred to as super resolution.
There are many super-resolution approaches. A widely used method is reconstruction-based super resolution which updates values of pixels of a high-resolution picture successively by repetition processing in order to acquire a stably high-quality high-resolution picture. Reconstruction-based super resolution includes positioning of multiple individual low-resolution pictures and repetition processing in which update processing for obtaining pixel values of a high-resolution picture is repeated. However, the reconstruction-based super resolution requires a large amount of computation because it involves positioning of multiple low-resolution pictures and uses repetition processing. In order to put this method into practical use, the amount of computation must be reduced. For example, one method is proposed in Patent Document 1 in which a parameter value of an evaluation function is optimized in repetition processing so as to reduce the number of repetitions.
An MAP (Maximum A Posteriori) method, which is a conventional technique for reducing the amount of computation without degrading the quality of a reconstructed super-resolution picture, will be descried below.
The MAP method uses as the initial value a high-resolution picture generated by using the bicubic method or the nearest neighbor method to obtain a high-resolution picture that maximizes a posterior probability when a number of low-resolution pictures that are observed pictures are given. The posterior probability is represented by an evaluation function which includes an error term and a convergent term. Given an imaging model, the error term represents the square error between a pixel value estimated from a high-resolution picture on the basis of the imaging model and a pixel value of a positioned low-resolution picture. The convergent term represents prior information based on the assumption that a picture is smooth all over the picture. An example of the evaluation function is given below.
                    [                  mathematical          ⁢                                          ⁢          formula          ⁢                                          ⁢          1                ]                                                            I        =                                            ∑                              i                =                0                                                              N                  ⁢                                                                          ⁢                  1                                -                1                                      ⁢                                          [                                                      b_vec                    ⁢                                          (                      i                      )                                        *                    h_vec                                    -                  fi                                ]                            2                                +                      α            ⁢                                                                            C                  ⁡                                      (                    h_vec                    )                                                                              2                                                          (                  equation          ⁢                                          ⁢          1                )            
Here, h_vec(i) in (equation 1) is a vector representation of the values of pixels in a high-resolution picture (hereinafter h_vec(i) after the n-th update is denoted by HR(n), fi is the values of pixels of a low-resolution picture after positioning, b_vec(i) is an element of a kernel representing an imaging model corresponding to the pixel position of fi, C is a function representing a prior information of smoothness, α is the weight of an error term and a convergent term, and Nl represents the number of the pixels of the low-resolution picture used for updating h_vec. Σ represents the total sum of Nl elements, from the 0-th to the Nl-1-th, (the total of the numbers of pixels of positioned low-resolution pictures and ∥ represents the L2 norm, and * represents the inner product of a vector.
In iterative computation, the evaluation function I in (equation 1) is minimized. For this purpose, optimization calculation such as a steepest-descent method or a conjugate gradient method may be used. In these method, the gradient I′ of the evaluation function I (equation 2) must be obtained.
                    [                  mathematical          ⁢                                          ⁢          formula          ⁢                                          ⁢          2                ]                                                            I        =                              2            ⁢                                          ∑                                  i                  =                  0                                                                      N                    ⁢                                                                                  ⁢                    1                                    -                  1                                            ⁢                              b_vec                ⁢                                                      (                    i                    )                                    ⁡                                      [                                                                  b_vec                        ⁢                                                  (                          i                          )                                                *                        h_vec                                            -                      fi                                        ]                                                                                +                      α            ⁢                          ∇                                                                                      C                    ⁡                                          (                      h_vec                      )                                                                                        2                                                                        (                  equation          ⁢                                          ⁢          2                )            
where ∇ represents the differential of elements.
As can be seen from (equation 2), computation for each pixel that depends on Nl is required for obtaining the gradient I′ and it is desirable that Nl be of the order equivalent to that of the number of the pixels of the high-resolution picture. Accordingly, the amount of computation is enormous.
Positioning for obtaining fi involves estimating the amount of motion between in the target picture for super resolution and reference pictures on a block-by-block or pixel-by-pixel basis. The amount of computation required for the estimation of the motion amount also increases as the number of reference pictures increases.
FIGS. 1A and 1B show an example in which the MAP method is applied to motion video. Application of the MAP method is not limited to motion video. For example, super resolution using still pictures of an object taken at different shooting positions (multi-view) is also possible. In the case of still pictures, the motion amount used in the following description corresponds to the amount of displacement between pictures and the motion estimation corresponds to estimation of the amount of displacement. In FIG. 1A, the k-th picture is the target picture for super resolution and the successive pictures preceding and following the target picture in time are reference pictures. One picture corresponds to one frame or one field. FIGS. 1B (a) and 1B (b) show pixels of the target picture for super resolution and pixels of a reference picture, respectively. The result of positioning by estimation of motion between the pictures is shown in FIG. 1B (c). The result of positioning shown in FIG. 1B (c) is obtained by positioning pixels of the target picture for super resolution and pixels of reference picture with the pixel positions of a high-resolution picture. The gray circles (dot-circles) and white circles in FIG. 1B (c) correspond to fi in Equation 1 or 2 and the sum of the numbers of gray and white circles corresponds to Nl. FIG. 1B (d) shows the relation between the pixels of the high-resolution picture and the pixels of the low-resolution picture after the positioning. The black circles represent the pixel positions of the high-resolution picture. The gray circles (dot-circles) after the target picture for super resolution is positioned are at the same positions as the black circles, which are the pixels of the high-resolution picture in FIG. 1B (d). It should be noted that the initial high-resolution picture is generated by interpolation of the target picture for super resolution shown in FIG. 1B (a) using the bicubic method. The error term in (equation 2) is calculated from the difference between the pixel value of a white circle (and gray circle (dot-circle)) and a pixel value of the white circle (and gray circle (dot-circle)) estimated from the pixel values of black circles around that circle. The convergent term in (equation 2) is calculated from the pixel value of a black circle. The pixel values of black circles are updated in every repetition of repetition processing.
It can be seen from the foregoing that computations involved in the positionings and the repetition processing make up a large part of the MAP method and reduction of the amount of computation for the two processes is a key.
FIG. 2 is a block diagram showing a configuration of an image processing device PROC 1 which performs conventional reconstruction-based super resolution. The image processing device 500 includes an image input unit 501, a motion estimation unit 502, a positioning unit 503, an initial picture decision unit 504, a reconstruction unit 505, and a memory 506. The image input unit 501 stores input image data in the memory 510. The motion estimation unit 502 retrieves required image data for motion estimation from the memory 510, estimates motion, and inputs obtained motion vector information 511 in the positioning unit 503. The positioning unit 503 then performs positioning based on the motion vector information 511 and outputs the result as position information 512. The initial picture decision unit 504 generates an initial high-resolution picture 513 in accordance with a specified scale factor. The reconstruction unit 505 performs repetition processing based on the position information 512 and the initial picture 513 to generate and output reconstructed picture data.
FIG. 3 is a block diagram showing a configuration of the reconstruction unit 505. The reconstruction unit 505 includes an update calculation unit 601 and a picture update determination unit 602. The update calculation unit 601 updates values of all pixels of a high-resolution picture on the basis of the position information 512 and the initial picture 513 in response to an update instruction signal 611 input from the picture update determination unit 602. The picture update determination unit 611 determines from the update result 612 of the high-resolution picture whether the repetition processing should be ended. If it determines that the repetition processing should be ended, the picture update determination unit 611 outputs high-resolution picture data; if it determines that the repetition processing should be continued, the picture update determination unit 611 provides an update instruction signal 611 to direct the update calculation unit 601 to update the high-resolution picture.
FIG. 4 is a flowchart showing operation of the conventional reconstruction-based super resolution. Positionings are performed at steps S001 through S004, the initial high-resolution picture is generated at step S005, and a high-resolution picture is reconstructed through repetition processing at step S006. Details of these steps will be described in order. First at step S001, image data representing a target picture for super resolution and N reference pictures is input. Here, N is a predetermined number of pictures. Then, determination is made at step S002 as to whether motion estimation and positioning have been completed using all N reference pictures. If so, the process proceeds to step S005; otherwise the process proceeds to step S003. At step S003, estimation of motion between the target picture pic_cur for super resolution and a reference picture pic_ref (k). Based on the motion amount estimated, positionings are performed at step S004. Here, k is an integer greater than or equal to 1 and less than or equal to N. At step S005, an initial high-resolution picture 513 is generated on the basis of the pixel values of the target picture pic_cur for super resolution in accordance with a specified scale factor. At step S006, the initial picture 513 is updated through repetition processing to output a reconstructed picture.
The motion estimation at step S003 and the repetition processing at step S066 will be described in further detail.
FIG. 5 is a flowchart showing operation of the motion estimation at step S003. The motion estimation is performed on a block-by-block basis and any block size can be specified. First, at step S0031, a pair of index numbers (i, j) specifying a block is set to (0, 0). Then, determination is made at step S0032 as to whether motion estimation for all blocks in the target picture pic_cur for super resolution has been completed. If completed, the motion estimation will end; otherwise, the process proceeds to step S0033, where estimation of motion between the (i, j)-th block in the target picture pic_cur and the block in the k-th reference picture pic_ref (k) is performed. Then, (i, j) is updated at step S0034 and the process returns to step S0032. In this way, estimation of motion between all blocks in the target picture pic_cur for super resolution and the N reference pictures is performed in the conventional motion estimation.
FIG. 6 is a flowchart showing operation of the repetition processing at step S0006. First, the number n of repetitions is set to 0 at step S0061. Determination is made at step S0062 as to whether the repetition processing has been completed. If the L2 norm of the gradient I′ of an evaluation function is smaller than a predetermined threshold value ε, the repetition processing will end and the process proceeds to step S0065, where a high-resolution picture HR (n+1) is output as a reconstructed picture. If the L2 norm is greater than or equal to the threshold value ε, the process proceeds to step S0063, where all pixel values of the high-resolution picture HR (n) are updated to generate an updated high-resolution picture HR (n+1). It should be noted that the high-resolution picture HR (0) agrees with the initial high-resolution picture 513 generated at step S005. Then, 1 is added to the repetition counts at step S0064 and the process returns to step S0062. In this way, all pixels in the high-resolution picture HR (n) are always updated in the conventional repetition processing.
[Patent Document 1] Japanese Patent Application Publication No. 2000-339450