A new class of wide-angle lens camera system which can replace a set of mechanical narrow-angle PTZ (“Pan/Tilt/Zoom”) camera systems has been described in the patent application Ser. Nos. 10/837,325 and 10/837,326, hereby incorporated by reference. This type of camera emulates a PTZ camera by modifying a distorted captured image to electronically correct the distortion and scale the image. It achieves this by first using an image sensor to capture a high-resolution image, and by then projecting regions from that captured image to emulate the views which would have been captured by a set of lower-resolution PTZ cameras.
Most current image sensors are not intrinsically colored, and so typically have an array of color filters placed on top of each pixel of the sensor such that the image captured through the color filter array resembles a mosaic, usually comprised of red, green and blue pixels. A Bayer filter mosaic is a color filter array (CFA) for arranging red blue green (RGB) color filters over a square grid of image photosensors. This arrangement of color filters is used in most single-chip digital image sensors found in digital cameras, camcorders, and scanners.
The raw colorized output is referred to as a Bayer pattern image. In this method, however, two thirds of the color data is missing for each pixel and this missing data must be interpolated or predicted from the adjacent pixels. This preparatory process, known as “demosaicing”, “demosaicking”, or “debayering”, is covered by many patents and has an extensive academic literature (see for example http://www.visionbib.com/bibliography/motion-i770.html as of Sep. 18, 2008). The processing algorithms interpolate a complete set of red, green, and blue values for each image.
There are many other processes that may be carried out to prepare captured images for image processing, such as ‘dead pixel removal’ (finding flawed pixels in the sensor image and altering the captured image to compensate for them) and ‘denoising’ (applying statistical models to images to detect and reduce noisy elements within the image). The end product of the initial image preparation phase is usually a clean, but rather dull and flat-looking image. In stills and video cameras, the camera is normally programmed to process the sensor image such that the image is as close to being acceptable to the human eye as possible.
The subsequent image processing stage, typically referred to as the “image color processing pipeline”, is designed to make the prepared image seem more “true to life” or “natural”. These goals are conventionally achieved by deploying various combinations of well-known processes, including brightness enhancement, contrast enhancement, white balancing, gamma correction, saturation enhancement, and color balancing. This list of processes is not necessarily complete or sufficient in every case. The nature and scope of the components used for such image color processing pipelines are well known to those skilled in the art, with two of the most frequently cited textbooks in this area being:    Fundamentals of Digital Image Processing (1989), Anil K. Jain, Prentice-Hall    Digital Image Processing, 3rd Edition (2007), Rafael C. Gonzalez & Richard E. Woods, Prentice HallBoth are hereby incorporated by reference.
There is a growing trend in modern image sensor design to integrate both the preparation stage and the image color processing pipeline stage into a single overall design, such that what is read from the image sensor is a color-corrected, full-color image. For camera systems (such as digital still cameras) where the desired final output is precisely a single large view, this kind of integration (where, for example, the sensor exposure can be chosen to give a good overall quality image) makes perfectly good sense.
However, because the new class of camera discussed here typically use an image sensor to capture a single high-resolution image from which multiple smaller views are simultaneously generated, it is very often impossible to program the camera to produce an image that will produce an optimal set of images for all the selected regions to be extracted from that image. This is particularly true for wide-angle cameras.
Examples of applications assigned to the assignee where this technology may be applied include U.S. non-provisional patent application Ser. Nos. 10/837,325 entitled “Multiple View Processing in Wide-Angle Video Camera” and 10/837,326 entitled “Multiple Object Processing in Wide-Angle Video Camera”, both of which were filed Apr. 30, 2004, and are hereby incorporated by reference. These applications claim priority to U.S. provisional patent applications 60/467,588 entitled “Multiple View Processing in Wide-Angle Video Camera” and 60/467,643 entitled “Multiple Object Processing in Wide-Angle Video Camera”, both of which were filed on May 2, 2003, and are hereby also incorporated by reference.
As an example of the utility of this processing, outdoor scenes often have a bimodal or multimodal distribution of luminance with areas of sky being much brighter than areas at ground level. At any particular sensor setting, the sky might be over-exposed, with many tones represented as white and with dark areas reduced to indistinguishable dark tones. As a further example, an indoor scene may include views illuminated by daylight from windows and regions illuminated by artificial lights with very different color temperatures. Without processing, the former will most likely appear too blue and the latter too red.
A conventional narrow-field mechanical PTZ camera copes with the change between brighter and darker areas by, for example, decreasing the exposure time of the sensor or using an auto iris mechanism in the camera lens. A multi-view, wide-angle camera system is unable to use these approaches, because one of its emulated camera views may be looking at a strongly lit region at exactly the same time that another view is looking at a heavily shaded region.
The technical problem addressed is how best to build a multi-view camera system such that all of the multiple views derived from a single high-resolution captured image can be of sufficient quality. Simplifying the physics slightly, there are broadly three kinds of problems that might be encountered.
First, too much light (‘over-exposure’) can cause ‘clipping’, which is when an individual sensor pixel reaches its maximum capturable value (i.e its ‘ceiling’). This is often noticeable as completely white areas in the image, where all the sensor pixels have reached the ceiling value in all channels.
Second, too little light (‘under-exposure’) can make the captured signal prone to ‘sensor noise’. This is because conventional image sensors are effectively light-energy-accumulating statistical devices that rely on the idea that each image sensor pixel will receive a sufficient amount of light-energy to make a statistically reliable assessment of the overall light intensity, so reducing the amount of light incident on each image pixel too much makes the final accumulated result statistically unreliable. This is particularly true for image sensors with many millions of image pixels, where each image pixel can be physically very small.
Third, too few bits of accuracy (i.e. how many different levels of intensity a captured signal can be represented with) inside the image processing pipeline can cause ‘quantization noise’, a truncation of the signal due to the image processing pipeline's inability to represent that signal.
It should also be noted that using too many bits of accuracy would have the effect of increasing the amount of memory needed by the camera system, as well as increasing the amount of memory needed to be read from and written to by the device (i.e. its ‘memory bandwidth’).
It is common practice for a multi-view camera system to have a single image pipeline for the whole image, and to then project multiple regions from that image as an entirely secondary stage. Yet, this typically leads to final projected views that are subject to all three kinds of distortion listed above, which can often be unsatisfactory.