Sensor and bandwidth constraints limit the spatial resolution and field-of-view (FOV) achievable in any visual system. In many applications (e.g. surveillance, teleconferencing), it is desirable to have both large field of view (e.g. to survey a panoramic area) and high resolution (e.g. to identify certain types of activity or to recognize faces).
Over the last ten years there has been increasing interest in the application of panoramic sensing to computer vision ((Danilidis & Geyer, 2000), (Hicks & Bajcsy, 2000), (Ishiguro, Yamamoto, & Tsuji, 1992), (Nayar, 1997), (Svoboda, Pajdla, & Hlavac, 1998), (Yagi & Kawato, 1990), (Yin & Boult, 2000)). Potential applications include surveillance, object tracking, and telepresence ((Haritaoglu, Harwood, & Davis, 1998), (Kanade, Collins, Lipton, Burt, & L. Wixson, 1998)). Most existing panoramic sensors are catadioptric, i.e. the sensor is composed of a camera and a curved mirror arranged so that the resulting system has a single viewpoint. It has been shown ((Danilidis & Geyer, 2000)) that the projection obtained with a catadioptric sensor with a single viewpoint is equivalent to the projection on a sphere followed by a perspective projection. Catadioptric sensors allow panoramic images to be captured without any camera motion. However, since a single sensor is used for the entire panorama, the resolution of such images may be inadequate for many applications. Switching from the 14 deg FOV of a typical lens to the 360 deg FOV of a panoramic camera results in a 26-fold reduction in linear resolution. For a standard 768×494 NTSC camera, horizontal resolution is reduced to roughly 0.5 deg/pixel, a factor of 60 below human foveal resolution.
There has been considerable work on space-variant (foveated) sensor chips ((Ferrari, Nielsen, Questa, & Sandini, 1995), (Pardo, Dierickx, & Scheffer, 1997)). However, since the number of photoreceptive elements on these sensors is limited, they do not provide a resolution or field of view advantage over traditional chips. Moreover, it is not clear how such sensors could be used to achieve a panoramic field of view over which the fovea can be rapidly deployed. A more common solution to the FOV/resolution tradeoff is to compose mosaics from individual overlapping high-resolution images that form a covering of the viewing sphere ((Irani, Anandan, & Hsu, 1995), (Kumar, Anandan, Irani, Bergen, & Hanna, 1995), (Szeliski, 1994), (Szeliski & Shum, 1997)).
These images can be obtained by a single camera that can rotate about its optical centre. Such a system is useful for recording high-resolution “still life” panoramas, but is of limited use for dynamic scenes, since the instantaneous field of view is typically small. An alternative is to compose the mosaic from images simultaneously recorded by multiple cameras with overlapping fields of view. The primary disadvantage of this approach is the multiplicity of hardware and independent data channels that must be integrated and maintained. For example, a standard 25 mm lens provides a field-of-view of roughly 14×10 degrees. Allowing for 25% overlap between adjacent images to support accurate mosaicking, achieving this resolution over a hemispheric field of view would require roughly 260 cameras.
The human visual system has evolved a bipartite solution to the FOV/resolution tradeoff. The field of view of the human eye is roughly 160×175 deg—nearly hemispheric. Central vision is served by roughly five million photoreceptive cones that provide high resolution, chromatic sensation over a five-degree field of view, while roughly one hundred million rods provide relatively low-resolution achromatic vision over the remainder of the visual field ((Wandell, 1995)). The effective resolution is extended by fast gaze-shifting mechanisms and a memory system that allows a form of integration over multiple fixations ((Irwin & Gordon, 1998)).
Variations on this architecture are found in other species. Many insects, for example, have panoramic visual systems ((Moller, Lambrinos, Pfeifer, & Wehner, 1998)). For example, the springing spider has four eyes that capture movement over the entire viewing sphere and two small field-of-view high resolution eyes used in predation and mating.
There have been some recent attempts to integrate high- and low-resolution imaging in artificial sensors. In June, 1996, Hughes Electronics filed a patent (U.S. Pat. No. 5,710,661) Integrated panoramic and high resolution sensor optics, which describes an optical apparatus that monitors an entire panorama in low resolution, and simultaneously monitors a selected portion of the panorama in high resolution. A drawback to this system is that both high and low resolution data are recorded on the same sensor, limiting both foveal and panoramic resolution.
In April, 1998, OmniView Inc. filed a patent (WO9846014A1), Method and apparatus for inserting a high resolution image into a low resolution interactive image to produce a realistic immersive experience. This patent describes a process of inserting a high resolution image into a low resolution display to produce a more convincing virtual reality experience. This patent was awarded in October, 1998. There are other related patents on the blending of high resolution and low resolution imagery in visual displays (e.g. US1984000576432, Filed February 1984, Granted January 1987).
Geisler and Perry (1998) have demonstrated a wavelet-based video encoding system that progressively subsamples the video stream at image points distant from the viewer-defined region of interest. Recent work with saccade-contingent displays (Loschky & McConkie, 1999) has shown that video data viewed in the periphery of the human visual system can be substantially subsampled with negligible subjective or objective impact. While our attentive panoramic sensor is not eye-slaved, these prior results do suggest that attention-contingent sampling for human-in-the-loop video is feasible and potentially useful.
Yin and Boult (2000) have developed a multiresolution panoramic image sensor based on stacking multiple parabolic mirrors of different sizes. Since the entire pyramid is sensed by a single sensor, this technique provides efficient access to very low resolution data, but does not solve the problem of obtaining and integrating high-resolution data with data at panoramic resolution.
Mann and Picard (1997) have investigated correlation-based methods for computing homographies to fuse images of different resolution taken by the same camera at different focal lengths, but do not address the problem of fusing images over much greater resolution differences from different cameras in real time.