Vision implies a reduction of a scene, which is actually three-dimensional, to a view providing a certain perspective of the scene. The view is substantially a projection of the three-dimensional scene onto a two-dimensional surface, such as the human retina or the photosensitive surface of a chip. The view is a reduction of the full three-dimension scene to a two-dimensional image of objects in the scene. The view depends on the projection center, i.e., the point of view, and the optical axis, i.e., the direction of view.
In many applications including camera surveillance and video post-production, the image taken by a camera in hardware may not necessarily coincide with a desired view. For example, the desired view may be a function of time, may depend on the motion of an object captured by the camera, or may even be unpredictable at the time the video is recorded. Conventionally, an image is recorded for a larger field of view with sufficient resolution, so that the image can be cropped to objects of interest. However, the point of view and the direction of view are fixed at the time of recording.
Existing devices using one or few cameras for recording an omnidirectional view can be classified into three groups according to their construction. A recording device according to the first group is configured to freely change the direction of the optical axis. The camera is supported by a mechanism to enable a rotation of the camera, e.g., an aerial camera system using a gimbal assembly as the rotating mechanism. The camera is mounted onto an electromechanical drive that actuates the two degrees of rotational freedom of the gimbal. As a result, the device is able to capture images in any desired direction. There are also some approaches to rotate the camera continuously in order to record views in all directions.
However, a conceptual limitation of the first group of devices is the inability to capture images in all directions simultaneously. Furthermore, the point of view cannot be changed in post-processing.
A second group of recording devices is static and uses at least one so-called fisheye lens. The fisheye lens provides an extremely large field of view, and can be mounted to replace a linear lens or objective, or can be additionally placed as an adaptor in front a linear imaging system. Some of the fisheye lenses are able to refract the rays onto the photosensitive surface up to an angle of incidence of 90 degrees. Thus, fisheye lenses allow covering a hemispherical view with a single camera. A recording device 10 including two cameras 12 aligned at opposite directions and each equipped with a fisheye lens 14 is shown in FIG. 1. The device 10 non-linearly combines a continuum of omnidirectional views over a complete sphere.
A major disadvantage of fisheye lenses is the non-linear distortion 16 of the image in the radial direction causing a decreasing resolution towards the image periphery. As a consequence of the non-linear imaging of the fisheye lens, the resulting image is not a planar projection but blends a continuum of viewpoints. It is not possible to map each pixel in the image resulting from a fisheye lens onto a projection plane of a defined view.
The third group of devices uses catadioptric imaging, which includes a reflecting surface to enhance the field of view. An exemplary configuration includes a conical mirror and a camera that is directed towards the mirror so that the optical axis of the camera is aligned with the rotational symmetry axis of the mirror. By reflecting different views having different directions onto the camera, a larger field of view is achievable. A device for catadioptric imaging is published in “Omnidirectional Video Camera”, by S. K. Nayar, Proceedings of the DARPA Image Understanding Workshop, May 1997. Another device for catadioptric imaging is published in “Multiview Radial Catadioptric Imaging for Scene Capture”, by S. Kuthirummal and S. K. Nayar, ACM Trans. on Graphics, July 2006.
The disadvantages of using such convex mirrors are similar to those of the fisheye lenses. The requirement of a single viewpoint is not fulfilled by most mirror geometries and spherical aberration due to the usage of non-planar mirrors becomes noticeable as blur in the image, especially as the aperture size is increased in order to improve image brightness.
A further conventional approach takes multiple images of different portions of an essentially static or changing scene. The multiple images are stitched together, e.g., based on overlapping border regions of the images. However, stitching a plurality of images combines different views into one image leading to an optically inconsistent image. Furthermore, image stitching does not provide a view that is different from any one of the camera views. An example for a spherical arrangement of multiple cameras is the “Throwable Panoramic Ball Camera” by J. Pfeil, Technical University Berlin. The device is tossed and takes the multiple images to be stitched at the vertex of the flight trajectory in the instant of zero velocity.
A conventional way of stitching multiple images 22 having different optical axes 23 uses a non-linear transformation 20 to wrap the multiple images 22 onto one spherical surface 24, as is shown in FIG. 2a. The desired image is cropped from a corresponding portion of a spherical surface 24, as is shown in FIG. 2b. The input images 22 are wrapped onto the spherical surface according to the non-linear transformation 20, p=(x,y)′→ps=(θ,φ)′, wherein
      θ    =          arctan      ⁡              (                  x          f                )              ,      ϕ    =          arctan      (              y                                            x              2                        +                          f              2                                          )        ,      f    ⁢          :        ⁢                  ⁢    focal    ⁢                  ⁢          length      .      
The transformed images are stitched on the spherical surface using conventional two-dimensional stitching techniques.
In a real device, however, it is not feasible that all the camera images are projected onto the same spherical surface 24. The transformation 20 takes the different optical axes 23 of the images 22 into account. But the transformation requires that all images be taken from the same point of view at the center of the sphere 24. In other words, the cameras would have to be precisely aligned so that their projection centers are at the same position. Otherwise, any slight misalignment causes a significant mismatch between the projected images, which in turn affects the cropped image.
Furthermore, the non-linear projective transformation 20 of pixel coordinates does not allow directly synthesizing the multiple images by means of a linear mapping operation. Therefore, computation-time costly image stitching algorithms have to be used, which is undesirable for real-time operation, particularly in the context of mobile devices. Conventional stitching algorithms are reviewed in Technical Report MSR-TR-2004-92, “Image Alignment and Stitching: A Tutorial”, by R. Szeliski, December, 2006.