There are several ways to sense a full color image. For example red, green, and blue images may be captured sequentially using different filters as is done in many scanners. Alternatively, white light may be focused by a lens and split into three color images. Each image is then sensed by a different sensor, as in "3-chip" cameras. The method that relates to the present invention uses a single sensor array of multiple pixels with a repetitive mosaic of colored filters placed so as to shadow each pixel with a single color. This is the method used in the great majority of video and digital cameras. One reason is that it provides for simultaneous capture of three colors for moving objects at the lowest cost by utilizing only a single sensor. This method is in fact the method used by the human eye to detect color. An example of such a sensor array is manufactured by Sony Corporation
To describe this single sensor array method in more detail, FIG. 1 depicts an array of pixels 102 comprising a sensor. A single row of these pixels 104 group into a scan line "Y". When an image is read from the array, the pixels in such a scan line are read sequentially along the scan line. The mosaic consists of columns of colors arranged in stripes of red 202, green 204, and blue 206. Such a mosaic is called a striped color array. When pixels are read sequentially from the scan line 104, a repetitive sequence 105 of red, green, and blue pixels are read.
With any mosaic of colored filters over a single sensor, color is in effect coded into a pattern set by the mosaic. For example, if in the case of the striped color array, every third column was bright, it would be a good guess that the scene had a single bright color. Another less likely, but not impossible guess, is that the scene contained closely spaced vertical lines. Particularly in scenes with both detail and color, patterns arise in the sensed image that can be interpreted as either color or image detail. A color decoding algorithm must attribute each pattern to either the color or image detail. If the wrong choice is made, artifacts arise, such as the shimmering colors in a referee's shirt as seen on an NTSC television reception.
To better understand the decoding of color from a striped color array, the problem is now presented in the frequency domain. FIG. 3 again depicts a striped color array 302 placed over a pixel array. As a row of pixels is read from the array, a pattern is read corresponding to the color of the image the array is viewing. For example, if the scene is bright green, then the sensors under the green stripes 304 will cause the output 306 from a row of pixels to have repetitive peaks 308 at a frequency corresponding to every third pixel. If the scene was bright blue instead, then the blue stripes 310 will cause the output 312 to have similarly spaced but differently placed peaks 314. So the presence of a bright color is sensed by the presence of a particular frequency, and the hue of that color is sensed by the phase of that frequency.
For this invention, let the frequency of a pure color (as represented for example, by signals 306 and 312 of FIG. 3, and hereinafter referred to as the color carrier) arbitrarily be assigned a frequency of 1.0. Then it follows that the pixel array itself has three pixels for each color cycle, and therefore samples at a frequency of 3.0. The Nyquist frequency of the pixel array, which is the maximum sensed frequency at which alternate pixels are light and dark, is half of 3.0, or 1.5, and the Nyquist frequency of a pure color is half of 1.0, or 0.5.
FIG. 4 depicts a sequence 402, of red, green, and blue stripes, 408 (which upon repeating connects again with red 410). Although these colors lie along a row 404, the repetitive nature lets them be thought of as representing a color circle 406. FIG. 5 expands this circle and shows how the colors can be represented by vectors around this circle. A green scene, for example, would stimulate peak response as the circle passed over the head of the green vector, 502. Any hue can be represented as a vector direction around this circle. Two hues of particular interest are the "I" vector, 504, which represents the "Inphase" component of NTSC television, and "Q" vector, 506, which represents the "Quadrature" phase of NTSC. The I vector was selected to match the most common hue direction of colors in the real world, which is the orange-blue hue axis, and the Q vector is the least common direction, which is the green-magenta hue axis.
With reference to FIG. 6, the effect of color striping is now portrayed in the frequency domain. In order to take advantage of this portrayal, colors are represented as consisting of a luminance, commonly called a "Y" component, 602, and two color components, the component, 604, and the "Q" component, 610, presented earlier.
A purely luminance, or black and white scene, will pass all the color filters equally, and so will stimulate an effect from a scan line equivalent to having no filters in place at all. This is represented in FIG. 6 as the Y curve 602 having spatial frequency content determined by the scene, attenuated by blurring in the sensor and associated optics, and limited by the raw Nyquist of the sensor array to a frequency of 1.5 in units of the color carrier, as defined earlier.
The color components arise from the same image edges as the luminance component, and so typically exhibit a spatial frequency shape very similar to the luminance component, however with reduced magnitude. In particular, all color, including the I component is typically much lower than the Y component, and hence the I component curve 604 has the same shape as the Y component curve 602, but is much lower. In addition, because color is effectively multiplied by, or modulated by, the color stripes, it appears to peak at the color carrier frequency of 1.0, and has an upper sideband 606 extending above 1.0 and a mirror image lower sideband 608 extending down. The Q signal is lower than the I in most cases because the I and Q vector directions were chosen in order to maximize this difference in magnitude for an average of scenes. The difference is usually quite large, as illustrated in FIG. 6 wherein the Q curve 610 is much lower than the I curve, 604. It is noted that the I and Q curves occupy the same frequency space centered at the color carrier, 612. However because both have two sidebands, they may be distinguished by phase. The two color components I and Q also overlap frequencies with the luminance Y component, 602. This overlap is the origin of artifacts in a single sensor color method. The prior art has attempted imperfect separation, and it is the intent of this invention to better separate the components.
The signal derived from a single sensor array under a color matrix has in effect a color signal coded into the signal that must be decoded to provide a useful color image. A basic technique in the art to provide this decoding into separate color components dividing each scan line 104 of FIG. 2 into three scan lines, represented by scan lines YR 208, YG 210, and YB 212. Because each of these scan lines consist of only pixels of like color, and so have only one-third as many pixels as the original scan line, each is limited to a Nyquist frequency of 0.5, as described above.
There were several drawbacks to this approach. The most obvious is that for black and white detail, the effective resolution of the array was limited to one-third as many pixels as physically contained in the array. In addition, any optical detail that passed beyond the low Nyquist frequency of 0.5 aliased into artifacts, giving the common effect in early digital cameras of one red eye and one blue eye.
Yet another possible approach in the prior art opposite to that just described assumed that images were essentially purely black and white. This assumption, of course, is valid, for example, when text printed on white paper with black ink is scanned. Under this assumption, the colored stripes have no effect. Accordingly, the full bandwidth of 1.5 (e.g., half of the array sampling frequency of 3.0) is filled. This results in a bandwidth of three times that of the previously described first technique, but in its pure form requires that the scene have no color detail at all because all frequencies are decoded under the assumption they arise from luminance detail.
The two prior art techniques just described suggested that there was a limited amount of information that could be allocated between three colors with 0.5 bandwidth each, or on a monochrome signal with 1.5 bandwidth, or any combination. A good compromise in the prior art was to allocate 0.75 bandwidth to the luminance, and 0.25 bandwidth to the color. It may be noted in FIG. 6 that at a frequency of 0.75 (which is 0.25 down from the color carrier at 1.0, 612) the luminance and color components are about equal magnitude. Hence statistically this is the optimum watershed frequency below which signal is interpreted as luminance and above which it is interpreted as color.
It is noted that by allocating less bandwidth to color, the highest frequencies from the array in the vicinity of 1.5 are not allocated to any channel, but are lost. In addition, if at the separating frequency of 0.75 half the information is from color and half from luminance, then half will cross over that "watershed" frequency into the wrong interpretation and produce artifacts.
Yet a third technique employed a median filter to estimate edge positions and achieve wider bandwidth for edges. A commercial realization of this system produced a bandwidth of exactly 0.75. However, because this was a non-linear technique, significant undesirable artifacts typically were present in the resulting image.