When a paper surface, such as magazines, newspapers, and books, is imaged by an imaging device, e.g. camera, there is a case where the number of pixels in an image pickup element device is insufficient to image a desired range on the paper surface in a desired resolution. In this case, it is proposed that the camera images an object while scanning the object as licking and a plurality of frame images that are taken are jointed, thereby generating a wide-field image in high resolution.
Japanese Patent Laid-Open Application No. 11-298837 (JP, 11-298837A) proposes an image input device used in a case where images (partial images) of adjacent scenes are joined to generate a piece of image. The image input device detects an overlapped region based on a motion vector between taken partial images, and when the overlapped region has an area sufficient to calculate an amount of geometrical correction for joining the partial images without a sense of incompatibility, that effect is displayed on an indicator. A user can determine, from the indication, whether a joined image in a sufficient quality with inconspicuous boundaries between partial images is taken or not. Also, Japanese Patent Laid-Open Application No. 2004-96156 (JP, 2004-096156A) proposes an image input system for showing the resultant joined image to the user on an imaging site, in order to allow immediate confirmation whether images can be combined easily or not, i.e., whether failure or non-failure of imaging, on the imaging site. A user can determine, from the presented image, whether the joined image in the sufficient quality with inconspicuous boundaries between partial images can be taken or not.
Now, when the plurality of partial images is joined to generate a wide-field image, the number of samples per a unit length, i.e., resolution, in each partial image varies in accordance with a distance between the camera and the object. For that reason, when the distance between the camera and the object varies, the wide-field image obtained by joining has different resolutions in different portions.
Although the method is proposed in which the geometrical distortion between partial images caused by the tilt of the camera by hand shakes and variations in the distance are corrected and the partial images are then joined, the joined image that is combined in this way includes partial blurry portions. Further, when the camera is tilted even in one partial image, the image that has ubiquitously different resolutions is taken and resolution variations occur in the joined image. This problem occurs more remarkably, when a relatively close object is imaged by manual camera scan (i.e., the object is scanned while the camera is moved), for example, the paper surface of newspapers or magazines is imaged, namely, when the object is close to the camera and is imaged at a wide-angle. In this description, the manual camera scan means that the object is scanned while a camera is held by hands and is moved.
Specifically, the techniques disclosed in JP, 11-298837A and JP, 2004-96156A described above target a use for panoramically imaging a distant view. In this use, though wobbling such as infinitesimal tilts caused by hand shakes or the like occurs in the camera motion while the camera is manually panned to take an image, the object is imaged within a level in which the object moves somewhat in parallel on the image. Therefore, hand shakes or the like have little effects on degradations in the image quality of partial images that are taken and mosaic images that are generated. However, when an object relatively close to the camera is imaged, hand shakes or the like have a profound effect. In other words, a wide-field image in partially low resolution with blurring is generated. Further, when the camera is excessively tilted, the image is taken in a manner that the paper surface has different resolutions in different points in spite of one partial image.
The method is proposed in which, even if shakes occur in the camera motion, a distortion parameter between partial images is estimated and the partial images are accurately positioned, thereby generating a mosaic image with inconspicuous joining points. However, no method is carried out such that user's camera scan is guided so as not to generate resolution variations on the mosaic image. The mosaic image is an image that a character or the like in a printed-paper, such as newspapers and magazines, is microscopically shown as mosaic.
Further, in order to solve these problems, when the camera scanning method is guided to the user so that a wide-field image can be generated in a desired quality, it is difficult to provide how the position and the orientation of the camera are corrected, for a user by an instinctive method. As its reason, in the conventional panoramic imaging to take a distant scene as an object, as described above, variations in the image caused by shakes, e.g., the rotation and the positional change of the camera, are levels in that the object slightly rotates and moves in parallel. However, in a close scene, when the camera is tilted, the object is distorted, and when the distance between the camera and the object changes even slightly, the size is changed and imaged. The tilt and position change of the camera have a large effect on the resolution of the partial image. It is difficult for the user to immediately grasp whether the camera has to be rotated or the position has to be moved when the camera scan is corrected.
FIG. 1 shows two images taken while scanning the same object. Assuming that image 21 shown on the left side is a reference image and image 22 shown on the right side is an image that is currently taken. Considerations are given to a case where it is necessary to correct the position and posture of the camera so that these images can be accurately positioned. In this case, it is difficult for the user to immediately grasp whether the camera has to be rotated or the position has to be moved. As a method of directing the user to make alignment, there is a method in which the reference image that has been taken is translucently superposed on an image that will be taken and the camera is moved to the position and posture to align them, like two examples shown in FIG. 2. However, it is difficult for the user to immediately grasp which direction and how much the position and posture of the camera are corrected only by referring to the superimposed image shown in FIG. 2. In FIG. 2, image 23 on the left side is an image in which a previous frame image is superimposed on the image that is currently taken, and image 24 on the right side is an image in which the previous frame image is slightly shifted and superimposed on the image that is currently taken.    [Patent Document 1] JP, 11-298837A    [Patent Document 2] JP, 2004-096156A    [Non-Patent Document 1] Zelnik-Manor and Irani, “Multi-Frame Estimation of Planar Motion,” IEEE Transactions on Pattern Analysis and Machine Learning, Vol. 22, No. 10, (2000)