1. Field of the Invention
The present invention relates to a resolution conversion apparatus, method and program for computing, for example, a higher resolution image from input images.
2. Description of the Related Art
High resolution TV sets and displays have come to be available. When an image is displayed on a TV set panel or display panel, the number of pixels included in image data is converted into that of pixels included in the panel. In particular, when performing conversion for enhancing the resolution, i.e., increasing the number of pixels, there is a known method for providing sharper images than linear interpolation. In this method (hereinafter referred to as the “reconstruction-based super resolution”), information concerning a plurality of frames is used to restore a high resolution image, considering inverse conversion of the image pickup process (degradation process) of the image.
In the reconstruction-based super resolution, attention is paid to the fact that both a reference frame and another frame contain images of the same subject. In consideration of this fact, motion of the subject is detected with a higher accuracy (subpixel-order accuracy) than the size of pixels, and a plurality of sampling values associated with the positions of the subject slightly displaced from the positions associated with pixel values are obtained, thereby combining the position information to realize a high resolution image. Assume here that the term “pixel value” indicates the level of an intensity signal output from each pixel of the image, and that the term “sampling value” indicates an intensity signal level corresponding to each subpixel position on an image of the subject. Namely, the term “pixel value” is associated with each integer pixel position, while the term “sampling value” is associated with an arbitrary subpixel position including the integer pixel position.
The reconstruction-based super resolution will be described in more detail. In this method, when low-resolution frames are temporally sequentially supplied, they are sequentially converted into high-resolution ones. For instance, consider the case that subsequent three frames of a moving picture obtained by photographing a moving vehicle are supplied as low-resolution frames. One of the three frames is used as a reference frame and subjected to a magnification process, where, for example, the resolution of the one frame is doubled on both horizontal and vertical directions (i.e., magnification factor is 2×2). When only one frame is used, the number of the pixels included in a low-resolution image (i.e., the number of known sampling values) is lower than that of the pixels included in an unknown high-resolution image. Even from this state, the pixel values of the unknown high-resolution image can be estimated. However, if the number of known sampling values can be increased, the unknown high-resolution image can be estimated more accurately. To this end, in the reconstruction-based super resolution, it is detected which pixel position on the reference frame of a low-resolution image corresponds to a certain pixel of another low-resolution frame at which a subject is picked up, and the pixel value of the detected pixel position on the reference frame is used as a sampling value associated with the certain pixel of the other frame.
Specifically, for example, a square block of several pixels around and including a target pixel is extracted from a certain frame of a low-resolution image, and a block, which has the same size as the block and includes pixel values close to those of the first-mentioned block, is searched for in the reference frame of the low-resolution image. A search is performed with subpixel estimation (see, for example, M. Shimizu et al., “Precise Subpixel Estimation on Area-based Matching,” in Proc. IEEE International Conference on Computer Vision, pp. 90-97, 2001). The center of the detected block is set as a point associated with the extracted block of the certain frame. Thus, point A in the certain frame is associated with point B in the reference frame, namely, these points are regarded as the same point on a subject. Such an associating algorithm as this will hereinafter be referred to as “block matching.” The above association is expressed by a motion vector that uses the point A as an initial point and the point B as an end point. Since searches are performed with subpixel estimation, the initial point of each motion vector indicates the position of a pixel, and the end point generally indicates a position at which no pixel exists. Motion vectors associated with all pixels of a low-resolution frame are acquired. Also concerning another low-resolution frame, motion vectors, which use their pixels as initial points and the associated pixels of the reference frame as end points, are detected. After acquiring motion vectors, the pixel value of the initial point of each motion vector is provided as a sampling value for the end point. From the sampling values arranged in a non-uniform grid, the values of the pixels of a high-resolution image uniformly arranged in a uniform grid are acquired. As such a conversion (degradation inverse conversion) method, non-uniform interpolation, POCS, ML or MAP estimator is known (see, for example, S. C. Park et al., “Super-Resolution Image Reconstruction: A Technical Overview,” IEEE Signal Processing Magazine, pp. 21-36, May 2003).
As well as the above-mentioned local block matching, global motion estimation, layer motion estimation, and the like are regarded as association methods (see, for example, J. Xiao et al., “Motion layer extraction in the presence of occlusion using graph cuts,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1644-1659, October 2005). In global motion estimation, motion of an image is assumed, for example, to be uniform over the entire screen as a panning by a camera, and a single block similar to the above-mentioned one is set over the entire screen to detect a single motion vector. Further, in layer motion estimation, motion of each object on a screen is estimated.
As described above, there are many motion estimation methods. Which method is suitable differs between images. When the motion estimation method employed is suitable, a more accurate high-resolution image can be obtained, whereas when it is not suitable, the quality of the reconstructed high-resolution image will be insufficient.