An omnidirectional camera is known as an apparatus for providing a landscape image around a user. An omnidirectional video image system of the type mentioned is typically formed from a plurality of cameras disposed so as to pick up images around a certain one point in a space determined as a visual point. An omnidirectional video image system formed from a plurality of cameras performs an image process of suitably pasting together boundaries of picked up images of neighboring ones of the cameras to produce an image over a wide region much greater than the visual field of each of the cameras as an image which looks as if it were picked up by a single wide angle camera.
Although a camera can pick up an image over a wide range if a lens having a wide view angle is used for the camera, the resolution decreases as much and particulars of a picked up image become less distinguishable. In contrast, where an omnidirectional video image system is used, a picked up image over a wide range can be provided with a high resolution maintained.
Where such an omnidirectional video image as just described is used, a video image of the free visual point type can be enjoyed. For example, in a television game wherein a character (the cast) can move about freely in a space, a background screen from an arbitrary visual point can be displayed. Consequently, the game can be enjoyed through a more real video image and is augmented in the entertainment property.
Further, while an omnidirectional video image has a great capacity when compared with an ordinary video image, since it is superior in the interactivity, it is promising as new contents in the broadband network age.
Incidentally, most of existing cameras ideally employ central projection based on a pinhole camera model. The central projection signifies arrangement of a color concentration value at each point of the surface of a three-dimensional object at an intersecting point between a straight line (also called “line of sight”) interconnecting the center of projection and the point on the surface of the object and a projection screen of the camera, and forms a projection image. The central projection has a characteristic that, even if an object of the same size is projected, as the object moves toward the center of projection of the camera, it is projected as an image of an increased size, but as the object moves away from the center of projection, it is projected as an image of a decreased size.
Meanwhile, an ideal pinhole camera model has a characteristic that points on the same line of sight are projected at the same position on the projection screen (that is, the picked up image plane) irrespective of the difference in distance from the center of projection of the camera. Accordingly, picked up images by neighboring cameras are disposed such that the centers of projection of the cameras may coincide with each other so that the cameras may have common lines of sight. As a result, while the different cameras are used, an image equivalent to that picked up by a single camera viewed from the same place is obtained. In other words, even if an arbitrary place in overlapping image pickup regions of neighboring cameras is designated as the boundary between the images, the picked up images are pasted together smoothly.
For example, a paper by Yalin Xiong and Ken Turkowski, “Registration, Calibration and Blending in Creating High Quality panoramas”, Fourth IEEE Workshop on Applications of Computer Vision, pp. 69-74, 1998, another paper by Heung-Yeung Shum and Richard Szeliski, “Construction of Panoramic mosaics with global and local alignment”, International Journal of Computer Vision, 36(2), pp. 101-130, 2000 and U.S. Pat. No. 6,157,747 propose techniques of pasting neighboring images picked up together using ideal pinhole cameras. Further, a paper by Satyan Coorg and Seth Teller, “Spherical Mosaics with Quaternions and Dense Correlation”, International Journal of Computer Vision, 37(3), pp. 259-273, 2000 proposes a technique of pasting a large number of picked up images together ignoring the lens distortion.
Actually, ideal pinhole cameras are seldom available, and a lens usually has distortion which cannot be ignored. Further, since a camera generally has a volume, it is physically impossible at all to dispose a plurality of cameras so that the image pickup centers of them are concentrated upon a single point. Also assembly of three or more cameras so that the image pickup centers of them coincide with one another in a three-dimensional space requires very difficult operations.
For example, a paper by R. Swaminathan and S. Nayar, “Non-Metric Calibration of Wide Angle Lenses and Polycameras”, IEEE Journal on Pattern Analysis and Machine Intelligence, pp. 1171-1178, 2000 proposes to transform picked up images of cameras once into pinhole images and then paste the pinhole images together in order to solve the problem of image pasting described above with regard to a camera model which has lens distortion in a diametrical direction and a tangential direction. In this instance, however, in order to complete an omnidirectional video image formed from a plurality of image frames pasted together, pixel interpolation must be performed totally twice, that is, upon transformation into pinhole images and upon pasting of the images. This gives rise to significant deterioration of the images.
It is to be noted that a paper by Y. Xiong and K. Turkowski, “Creating image-based VR using a self-calibrating fisheye lens”, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 237-243, 1997 proposes a technique for pasting images picked up by a fisheye lens as a lens other than a pinhole lens.
Meanwhile, a paper by R. Y. Tsai, “A versatile Camera Calibration Technique for High Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses”, IEEE Journal of Robotics and Automation, Vol. RA-3, No. 4, pp. 323-344, 1987 discloses a technique of calculating a distortion parameter of lenses of various camera models at a high speed and with a high degree of accuracy.
In order to paste together a plurality of picked up images from different cameras to construct an omnidirectional image, preferably the cameras are ideal pinhole cameras. Therefore, input images from the cameras must be pasted together after removal of lens distortion and transformation into pinhole images are performed. However, this repeats pixel transformation two or more times, resulting in significant deterioration of the picture quality. Further, the angular field of view of pinhole cameras is 180 degrees to the utmost.
Further, since camera images having different internal parameters or different distortion parameters cannot be pasted together well, it is necessary to uniformly use picked up images from cameras of the same lens model as original images to be pasted together. In other words, the flexibility of the image pickup system in the design configuration is poor.
Also it is possible to form an omnidirectional video image from a plurality of video images picked up from different places using only a single camera taking the facility in pasting of camera images into consideration. For example, a paper by S. E. Chen, “QuickTime VR—an image-based approach to virtual environment navigation”, Computer Graphics (SIGGRAPH '95), pp. 29-38, August 1995 describes a technique of forming an omnidirectional video image from a plurality of video images picked up at several places using only a single camera. The technique, however, cannot cope with such a case that video images in different image pickup directions are picked up and supplied simultaneously on the real time basis.