1. Field of the Invention
The present invention relates to a video image synthesis method and a video image synthesizer for synthesizing a plurality of contiguous frames sampled from a video image to acquire a synthesized frame whose resolution is higher than the sampled frame, and a program for causing a computer to execute the synthesis method.
The present invention also relates to an image processing method and image processor for performing image processing on one frame sampled from a video image to acquire a processed frame, and a program for causing a computer to execute the processing method.
2. Description of the Related Art
With the recent spread of digital video cameras, it is becoming possible to handle a video image in units of single frames. When printing such a video image frame, the resolution of the frame needs to be made high to enhance the picture quality. Because of this, there has been disclosed a method of sampling a plurality of frames from a video image and acquiring one synthesized frame whose resolution is higher than the sampled frames (e.g., Japanese Unexamined Patent Publication No. 2000-354244). This method obtains a motion vector among a plurality of frames, and computes a signal value that is interpolated between pixels, when acquiring a synthesized frame from a plurality of frames, based on the motion vector. Particularly, the method disclosed in the aforementioned publication No. 2000-354244 partitions each frame into a plurality of blocks, computes an orthogonal coordinate coefficient for blocks corresponding between frames, and synthesizes information about a high-frequency wave in this orthogonal coordinate coefficient and information about a low-frequency wave in another block to compute a pixel value that is interpolated. Therefore, a synthesized frame with high picture quality can be obtained without reducing the required information. Also, in this method, the motion vector is computed with resolution finer than a distance between pixels, so a synthesized frame of high picture quality can be obtained by accurately compensating for the motion between frames.
When synthesizing a plurality of video image frames, it is also necessary to acquire correspondent relationships between pixels of the frames in a motion area. The correspondent relationship is generally obtained by employing block matching methods or differential (spatio-temporal gradient) methods. However, since the block matching methods are based on the assumption that a moved quantity within a block is in the same direction, the methods are lacking in flexibility with respect to various motions such as rotation, enlargement, reduction, and deformation. Besides, these methods have the disadvantage that they are time-consuming and impractical. On the other hand, the gradient methods have the disadvantage that they cannot obtain stable solutions, compared with block matching methods. There is a method for overcoming these disadvantages (see, for example, Yuji Nakazawa, Takashi Komatsu, and Takahiro Saito, “Acquisition of High-Definition Digital Images by Interframe Synthesis,” Television Society Journal, 1995, Vol. 49, No. 3, pp. 299-308). This method employs one sampled frame as a reference frame, places a reference patch consisting of one or a plurality of rectangular areas on the reference frame, and respectively places patches which are the same as the reference patch, on the others of the sampled frames. The patches are moved and/or deformed in the other frames so that an image within each patch coincides with an image within the reference patch. Based on the patches after the movement and/or deformation and on the reference patch, this method computes a correspondent relationship between a pixel within the patch of each of the other frames and a pixel within the reference patch, thereby synthesizing a plurality of frames accurately.
The above-described method is capable of obtaining a synthesized frame of high definition by estimating a correspondent relationship between the reference frame and the succeeding frame and then assigning the reference frame and the succeeding frame to a synthesized image that has the finally required resolution.
However, in the method disclosed by Nakazawa, et al., when the motion of a subject in the succeeding frame is extremely great, or when a subject locally included in the succeeding frame moves complicatedly or at an extremely high speed, there are cases where the motion of a subject cannot be followed by the movement and/or deformation of a patch. If the motion of a subject cannot be followed by the movement and/or deformation of a patch, then a synthesized frame will become blurred as a whole or a subject with a great motion included in a frame will become blurred. As a result, the above-described method cannot obtain a synthesized frame of high picture quality.
Also, in the method disclosed by Nakazawa, et al., an operator manually sets the range of frames that include a reference frame when sampling a plurality of frames from a video image, that is, the number of frames that are used for acquiring a synthesized frame. Because of this, the operator needs to have an expert knowledge of image processing, and the setting of the number of frames will be time-consuming. Also, the manual setting of the number of frames may vary according to each person's subjective point of view, so a suitable range of frames cannot always be obtained objectively. This has an adverse influence on the quality of synthesized frames.
Further, the method disclosed by Nakazawa, et al. selects one or a plurality of reference frames when sampling a plurality of frames from a video image, and samples a predetermined range of frames for each reference frame, including the reference frame. The selection of reference frames is performed manually by an operator, so the operator must have an expert knowledge of image processing and the selection is time-consuming. Also, the manual selection of reference frames may vary according to each person's subjective point of view, so proper reference frames cannot always be determined objectively. This has an adverse influence on the quality of synthesized frames. In addition, reference frames are set by the operator's judgement, so the intention of a photographer cannot always be reflected and a synthesized frame with scenes desired by the photographer cannot be obtained.
Also, with the spread of digital video cameras, the video images taken by digital video cameras can be stored in a personal computer (PC), and the video images can be freely edited or processed. Video image data representing a video image can be downloaded into a PC by archiving the video image data in a database and accessing the database through a network from the PC. However, the amount of data for video image data is large and the contents of the data cannot be recognized until it is played back, so it is difficult to handle, compared with still images.
To easily understand the contents of video images archived in a PC or database, there has been proposed a method of detecting a frame that represents a scene contained in a video image, and attaching this frame to the video image data (e.g., Japanese Unexamined Patent Publication No. 9 (1997)-233422). According to this method, the contents of a video image can be grasped by referring to a frame attached to video image data, so it becomes possible to handle the video image data easily.
However, in the video image, unlike still images, each frame on a temporal axis in the video image includes a blur unique to the video image. For instance, a subject in motion, which is included in a video image, has a blur proportional to the moved quantity in the moving direction. Also, video images are low in resolution, compared to still images taken by digital still cameras, etc. Therefore, the picture quality of frames, sampled from a video image by the method disclosed in the above-described Japanese Unexamined Patent Publication No. 9 (1997)-233422, are not so high.