Field of the Invention
The present invention relates to an image processing apparatus, an image synthesizing apparatus, an image processing system, an image processing method, and a storage medium.
Description of the Related Art
In recent years, a technique, called “visual simultaneous localization and mapping (SLAM)”, for estimating the three-dimensional position and orientation of a moving camera from a video captured by the camera has been put into practical use. The visual SLAM technique can be applied to mixed reality (MR) technology and augmented reality (AR) technology that display a three-dimensional computer graphics object, which is virtually present, on the video as a rendered image based on the position and orientation of the camera. The technique for estimating the position and orientation of the camera from a video can be divided into a method that uses a marker and a method that does not use a marker. Both methods estimate the position and orientation of a camera in a three-dimensional space by identifying a marker or a natural object between frames and following the motion of the marker or the natural object (hereinafter, this following operation will be referred to as “tracking”). A position/orientation estimation method that uses a marker is disclosed in Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System, by Hirokazu Kato and Mark Billinghurst, in Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality, 1999 (hereinafter referred to as Non-Patent Document 1). A position/orientation estimation method that does not use a marker is disclosed in Parallel Tracking and Mapping on a Camera Phone, by Georg Klein and David Murray, in Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR 2009, Orlando) (commonly known as PTAM) (hereinafter referred to as Non-Patent Document 2). According to the MR technology, a map, called “environmental map”, is generated from the estimated position and orientation of a camera, the map indicating the three-dimensional position of a marker or an object. Then, by using the environmental map, the position and orientation of a CG object is determined and superimposed on an input video. Through the above steps, it is possible to obtain a video that shows the CG object as if it was present in the reality space. At this time, whether or not the CG object can be superimposed at the correct position depends on the tracking accuracy, and the tracking accuracy depends greatly on the characteristics of each frame image in the video.
The characteristics of a frame image depend on a sensor and the conditions for driving the sensor. In the case of using, for example, a rolling shutter sensor, which is commonly used in a CMOS sensor, a distortion called “rolling shutter distortion” occurs in situations where there is a moving object in the scene or where the camera is panned. This distortion reduces the accuracy of identifying the marker or the object between frames, as a result of which the tracking accuracy and the accuracy of position/orientation estimation are reduced. On the other hand, in the case of using a global shutter sensor as typified by a CCD, such a rolling shutter distortion does not occur. However, it is generally recognized that the global shutter sensor requires a high driving voltage and it is therefore difficult to achieve a high resolution and a high frame rate. Even with the rolling shutter sensor, by improving the driving speed of the sensor, the rolling shutter distortion can be reduced significantly.