1. Technical Field
This disclosure relates to digital image processing and more particularly to a compound camera sensor for taking pictures of a scene to be captured with a desired depth of field and a method of processing images taken by the compound camera sensor.
2. Description of the Related Art
A compound camera consists of a set of component cameras, a data processor and an image processing software that runs on the data processor. The component cameras may be synchronized through wired or wireless electronic signals. Individual images from the component cameras are transmitted to the data processor through wired or wireless connections. The image processing software takes images from the component cameras as input and synthesizes an output image following the specifications of a virtual camera.
A conventional compound camera may be implemented in different ways. In a first conventional embodiment, a compound camera may comprise a number of synchronized regular video cameras and a separate microprocessor connected to the video component cameras. In a second conventional embodiment, a plurality of component image sensors and a microprocessor may be integrated on a substrate, such as a printed circuit board (PCB) or a hybrid substrate. Synchronization and communication are accomplished through connections on the substrate. In a third conventional embodiment, the component image sensors and the microprocessor are very small and are integrated on a single silicon chip.
The physical model of a camera consists of a shutter, a lens and an image plane. The shutter has an aperture that lets light enter into the camera. A bundle of light rays coming out of a point on an object surface enters through the aperture, is refracted by the lens, and is gathered and focused on the image plane, where the color of the object point is recorded.
For a certain aperture size, there is a range of field depths within which the image is sharp. This is called the “depth-of-field” and it is inversely proportional to the aperture size. The image plane slides back and forth to search for the best overall image within the range of the depth-of-field. Normally, a large depth-of-field coverage is desired. This, in turn, requires high sensitivity from the sensor because the aperture size is proportionally small.
Traditional cameras rely on complex optical and mechanical components to modify focus and aperture. Dimensional constraints limit the maximum resolution a camera can achieve. An enhancement of the camera performance may be implemented digitally by running an image processing software on the embedded microprocessor.
A brief survey on the main approaches for obtaining digital color pictures is presented hereinbelow.
Multi-Sensor Compound Camera Approach
As many separate sensors with M*N pixels are considered as the number n of required color channels (e.g.: 3 sensors for RGB or 4 sensors for special high-quality applications). Light from the scene is opto-mechanically separated into color components and then directed to sensors through a properly designed set of beam-splitting prisms. Such a solution maximizes image quality because no color interpolation is needed because all M×N×n pixel color components are directly available. Moreover the system can achieve outstanding detail with highly accurate color reproduction suitable for the demands of high-end video production—wide dynamic range, low color noise, high-contrast detail, natural color resolution and low-aliasing.
High costs due to the number of sensors and the complexity of the opto-mechanical part is a drawback of this solution.
A sub-pixel shifting among the n images of n sensors can be realized mechanically, exploiting it for enhancing image resolution beyond M*N and improving quality.
This technique is frequently used in the high-end market of top-quality video cameras. Present video camera technologies are listed below, wherein the pixel shifting technique coupled to the use of three sensors permits to achieve a ×2 or even more zoomed high quality picture:
JVC Pixel Shift technology;
3CCDs (RGB) 1.3 MP;
resolution enhancement to 5 MP images;
beam-splitting prism;
CANON Super Pixel Shift technology;
3CCDs (RGB);
beam-splitting prism (10-1 pixel Precision); and
SONY 3CMOS Videocamera DCR-PC1000.
Classical One-Sensor Approach
Only one sensor is used where a pattern-shaped filter (e.g. Bayer pattern) is present between the lens and the sensor. In such an approach not all M*N*n are available but only M×N with distribution defined by the filter pattern. Compared to a multi-sensor compound camera approach this classical solution has a much lowercost but image quality is inferior because color interpolation algorithms are used for reconstructing the missing M*N*(n−1) pixel color components.
Nevertheless this is a “standard” solution for consumer market cameras because it represents a good compromise among cost, image quality and size of the image acquisition device (sensor+optics).
One-Sensor Compound Camera Approach
One sensor, divided into n regions (4 regions for RGB) is used. The light coming from the scene to be captured is directed identically over all n regions. The filter pattern as depicted in FIG. 1 is much simpler because the color is uniform over each region. Moreover for optical reasons that will become clear hereinafter, the “thickness” (dimension along the optical axis) of such a system can be significantly reduced. Compared to the classical one-sensor approach, the size of each of the n images corresponding to different areas and different colors is a fraction of the entire sensor size (say M/2*N/2 in case of 4 regions) and appropriate image processing algorithms (zooming, super resolution) must be used to obtain a full (M*N) resolution image.
In analogy to the compound camera multi-sensor approach, the light directed towards the n sub-regions can be shifted by a pixel fraction both in x and y directions. Information obtained by sub-pixel shifting can be exploited for enhancing resolution of the final M*N image and improving image quality.
One degree of freedom in designing the sensor for a compound camera is related to the fourth color channel. In general, the following options exist:
in case of RGB color approach, the fourth channel is used for a second green channel;
in case of a four-color system (e.g. CyYe G Mg), each channel is used for different color;
the fourth channel is used without any color filter.
In the ensuing description the first case of a RGB color camera will be considered however the considerations that will be made in describing this disclosure hold mutatis mutandis also for the other two options.
A review of known architectures and related algorithms for one-sensor compound camera with sub-pixel shifting follows.
Together with the image processing pipeline for treating the signals output by a compound camera sensor, a brief description of the distinguishing features of the compound camera sensor is given because the algorithms executed in the processing pipeline are strictly related to sensor configuration and to its physical properties.
Relevant publications are here below identified and briefly commented for outlining the main aspects of each technology.
Nokia US 2005/0128335
Four QVGA areas corresponding to red R, green G1, blue B (or CyMgYe) and green G2 channel, respectively, are contemplated. The G2 channel may be substituted by a filter-less area sensitive only to light intensity.
It is compatible with both CCD and CMOS sensor technologies.
First a sub-images alignment through parallax correction is performed, then one of the following steps is performed:
interpolation to VGA of the missing color component+merging of color components; or
merging of color components into an artificial Bayer pattern+interpolation to VGA of missing color values.
The solution is specifically developed for VGA image.
Nokia US 2005/0128509
Several alternatives are disclosed:
two sensor areas;
four sensor areas of different sub-array configuration in color distribution or in the pixel size.
The image processing pipeline performs:
alignment with parallax correction+interpolation to VGA+merging; or
alignment with parallax correction+merging+interpolation to VGA.
The solution is specifically developed for VGA images.
Different device structures or sensor area assignments are proposed, in some cases the pixel size for the green channel is smaller than that for red and blue pixels.
Agilent U.S. Pat. Nos. 6,983,080 6,983,080
Multiple shots of the same scene are taken with a common camera;
resolution of each image is enhanced;
motion vectors are estimated;
a high resolution image is reconstructed exploiting estimated motion vectors;
sub-pixel shift is considered in some cases but as a motion vector;
a super resolution algorithm is used for enhancement of the green channels through the estimation of a motion vector among G1 and G2 sub-arrays as if they where two successive frames of a movie;
reconstruction of an artificial Bayer Pattern is contemplated.
Canon U.S. Pat. No. 6,753,906
Orthogonal mirror plates moved by electromagnets create 9 different optical paths for a 3×3 shift matrix with ⅓ pixel pitch;
a CyYeGMg color filter is used;
6 images are stored in the memory corresponding to different positions of the mirror plates and therefore having a different shift of a fraction of a pixel pitch;
an artificial Bayer Pattern is used for enhanced resolution image reconstruction;
multiple images of the same scene are taken through multiple shots corresponding to multiple configuration of the opto-mechanical shifting device.
Canon U.S. Pat. No. 6,803,949
The sensing procedure senses if the image is black and white or a color image;
4 shots of the same scene are taken with the same CCD and four images are stored in memory;
an artificial BP is reconstructed and interpolated for enhancing resolution;
½ pixel shifting is performed four times;
artificial BP reconstruction;
channel shifts (½, 0) (½,1) (3/2,0) (3/2,1).
Canon U.S. Pat. No. 6,678,000
A ±⅔ pixel shifting optical device performs 9 shifts;
9 frames of the same scene are stored;
each shot is taken with an RGB CFA sensor;
an image enhanced by a factor 3 is obtained
ST US 2004/0196379
Several shots of the same scene are taken with different video-cameras;
images are combined for increasing depth of field
low quality images are combined for obtaining an enhanced one
video processing of images is disclosed.
Kodak U.S. Pat. No. 5,812,322
Each lenslet has a different field of view;
all the sub-images are stitched together;
three levels of lenslets to reduce “banding” effects;
numerous lenslets to reduce thickness;
final FOV is obtained as a stitched artwork of all the lenslets sub-images.
Kodak U.S. Pat. No. 6,137,535
Each lenslet of the array is oriented such to reduce parallax errors;
each lenslet has a different field of view;
all sub-images are stitched together;
numerous lenslets (up to 150×100) are employed to reduce thickness;
each sub area of the sensor is provided with a Bayer pattern color filter.
Telxon Corporation U.S. Pat. No. 6,053,408
A dataform reader with different lens systems, each focusing at a fixed distance and the sharpest image from the plurality of images taken of the same scene to be captured is selected;
the output image is reconstructed using only the selected image;
an array of separated common optical systems is used.
Tangen et al. U.S. Pat. No. 6,765,617
A set of images is obtained through a multi-lenslet system;
sub-images of the same scene but at lower resolution than the full size image are obtained;
each sub-image can sample a single color and final RGB data can be obtained by combining sub-images;
each pixel in the sub-areas is positioned in a way that there is not overlapping in any of the four sub-images;
the lenslet system is used to reduce local density of photodetectors: i.e. k lenslets may reduce density to 1/k of the original needs;
lenslet are used to reduce sensor resolution.
Foveon U.S. Pat. No. 6,958,862
Four sub-images shifted from each other are obtained;
each pixel is acquired in RGB according to a special technique;
due to shifting, the center of each sub image does not correspond to the center of the other images but they can be considered as pixels in a neighborhood of a given centre improving final resolution.
ST US 2004/0196379
A plurality of cameras arranged in a matrix disposition generate a plurality of images, however the images are not identical because the cameras are relatively close to objects of the scene, thus parallax effects are relevant;
each camera has a different representation of the scene depending on the angle between the ray from a point of an object to the image plane of each camera sensor;
a new synthetic image is generated by a video processor using the taken images;
different images of the same scene allow to calculate point-to-point differences to be used for enhancing resolution.
For small pocketable cameras destined for the consumer market (e.g. the camera incorporated in cellular phone), the thinness (f1 in FIG. 2) of the camera device comprising the light sensor, filters, optics and mechanics is a key factor of appeal. In traditional arrangements (typically a one-sensor camera with a single lens and Bayer pattern filter) the minimum overall thickness that can be reached is determined by the focal length of the lens, the sensor size, the focus distance and final optical image quality that is required. Therefore, a substantial reduction of the overall thickness of the device can only be obtained by reducing the focal length of the lens. This is commonly obtained by reducing the sensor size and as a consequence reducing image quality because for the same pixel size the number of pixels must be reduced.
There is a persistent need in the art for more compact light sensors capable of generating color images of good quality of a scene to be captured and of methods for efficiently processing images generated by these light sensors.
There is also an attendant need for related processing means capable of generating images having an extended depth of field.