(1) Field of the Invention
The invention relates generally to the field of color sensors and more particularly to color sensors having neural networks with a plurality of hidden layers, or multi-layer neural networks, and further to a new neural network processor for sensing color in optical image data.
(2) Description of the Prior Art
Electronic neural networks have been developed to rapidly identify patterns in certain types of input data, or accurately to classify the input patterns into one of a plurality of predetermined classifications. For example, neural networks have been developed which can recognize and identify patterns, such as the identification of hand-written alphanumeric characters, in response to input data constituting the pattern of on and off picture elements, or xe2x80x9cpixelsxe2x80x9d, representing the images of the characters to be identified. In such a neural network, the pixel pattern is represented by, for example, electrical signals coupled to a plurality of input terminals, which, in turn, are connected to a number of processing nodes, each of which is associated with one of the alphanumeric characters which the neural network can identify. The input signals from the input terminals are coupled to the processing nodes through certain weighting functions, and each processing node generates an output signal which represents a value that is a non-linear function of the pattern of weighted input signals applied thereto. Based on the values of the weighted pattern of input signals from the input terminals, if the input signals represent a character that can be identified by the neural network, the one of the processing nodes associated with that character will generate a positive output signal, and the others will not. On the other hand, if the input signals do not represent a character that can be identified by the neural network, none of the processing nodes will generate a positive output signal. Neural networks have been developed which can perform similar pattern recognition in a number of diverse areas.
The particular patterns that the neural network can identify depend on the weighting functions and the particular connections of the input terminals to the processing nodes. The weighting functions in, for example, the above-described character recognition neural network, essentially will represent the pixel patterns that define each particular character. Typically, each processing node will perform a summation operation in connection with values representing the weighted input signals provided thereto, to generate a sum that represents the likelihood that the character to be identified is the character associated with that processing node. The processing node then applies the non-linear function to that sum to generate a positive output signal if the sum is, for example, above a predetermined threshold value. Conventional non-linear functions which processing nodes may use in connection with the sum of weighted input signals is generally a step function, a threshold function, or a sigmoid, in all cases the output signal from the processing node will approach the same positive output signal asymptotically.
Before a neural network can be useful, the weighting functions for each of the respective input signals must be established. In some cases, the weighting functions can be established a priori. Normally, however, a neural network goes through a training phase, in which input signals representing a number of training patterns for the types of items to be classified, for example, the pixel patterns of the various hand-written characters in the character-recognition example, are applied to the input terminals, and the output signals from the processing nodes are tested. Based on the pattern of output signals from the processing nodes for each training example, the weighting functions are adjusted over a number of trials. After the neural network has been trained, during an operational phase it can generally accurately recognize patterns, with the degree of success based in part on the number of training patterns applied to the neural network during the training stage, and the degree of dissimilarity between patterns to be identified. Such a neural network can also typically identify patterns that are similar, but not necessarily identical, to the training patterns.
One of the problems with conventional neural network architectures as described above is that the training methodology, generally known as the xe2x80x9cback-propagationxe2x80x9d method, is often extremely slow in a number of important applications. In addition, under the back-propagation method, the neural network may result in erroneous results that may require restarting of training. Even after a neural network has been through a training phase, confidence that the best training has been accomplished may sometimes be poor. If a new classification is to be added to a trained neural network, the complete neural network must be retrained. In addition, the weighting functions generated during the training phase often cannot be interpreted in ways that readily provide understanding of what they particularly represent.
Edwin H. Land""s Retinex theory of color vision is based upon xe2x80x9cthree colorxe2x80x9d experiments performed before 1959. A simple xe2x80x9cmishapxe2x80x9d showed that three colors were not always required to see accurate color. Land used a short and long record of brightness data (black and white transparencies) to produce color perceived by human eyes and not by photographic means. He demonstrated a perception of a full range of pastel colors using two very similar in color light sources such as yellow, at 579 nm and yellow orange, at 599 nm (xe2x80x9cExperiments in Color Visionxe2x80x9d, Edwin H. Land, Scientific American, Vol. 200 No. May 5, 1959). Land found that in some two record experiments all colors present were not perceived. Although Land demonstrated that two records provided color perceptions, he constructed his Retinex theory upon three records such as his long, medium and short records (An Alternative Technique for the Computation of the Designator in the Retinex Theory of Color Visionxe2x80x9d, Edwin H. Land, Proceedings of the National Academy of Sciences, Vol. 83, 1986). The invention herein is related to human color perception discovered during Land""s color vision experiments as reported in 1959.
The xe2x80x9cTrichromaticxe2x80x9d theory in human color vision has been accepted on and off since the time of Thomas Young in 1802 (A Vision in the Brainxe2x80x9d, S. Zeki, Blackwell Scientific Publishing, 1993). Still and video electronic camera designs are correctly based upon the trichromatic theory but the current designs are highly subjective to color error reproduction due to changes in the ambient light color temperatures and color filtrations. The device in this invention senses color using a new xe2x80x9cbichromaticxe2x80x9d theory, which includes a mechanism that insures color constancy over a large range of ambient color temperatures. The use of two lightness records as used by Land in 1959 is one key to this invention.
The bichromatic theory is based upon an interpretation of a biological color process that occurs in the eyes and brain of humans and in some animals. The bichromatic theory is defined as a system that functions together under the following assumptions, accepted principles and rules of procedure, for which FIGS. 4A and 4B are provided for support:
(1) The system is a color sensing retina. There are at least two photo transducers in each pixel space in the retina, shown in FIG. 4B as TR(HI) and TR(LO).
(2) The two photo transducers sense the color of the light at each pixel""s position in a scene of color focused on the retina. Each of the at least photo transducers contains a different spectral response and the wavelength difference between the peaks of a pair of these responses is called the waveband or the spectral bandwidth of the two photo transducers.
(3) The two photo transducers have overlapping spectral logarithmic responses where their slopes are opposing each other as indicated in FIG. 4A.
(4) The photo transducers have at least two controlled gain amplifiers (CGA) and at least two common controlling circuits. There is one controlled gain amplifier for each photo transducer where each of the at least two common controlling circuits controls the controlled gain amplifiers for all the photo transducers of the same spectral response.
(5) The highest energy value in the retina, or the peak energy from a photo transducer of a specific spectral response, controls the output of the common controlling circuits that normalize the logarithmic response of all photo transducers with the same spectral response. Thus, it is always the peak energy photo transducer no matter its position in the retina that controls the common mode gain. The peak response of a photo transducer is relative to the best matched wavelength of energy for all wavelengths of light impinging on the color retina. Therefore, each photo transducer will be continuously normalized to the peak photo transducer signal in response to changes in ambient lighting.
(6) In a general discussion herein a normalized photo transducer or a normalized pixel includes the controlled gain amplifier as part of its response. A photo transducer sensing the peak energy or a peak energy sensing photo transducer will only be called as such thus a normalized photo transducer will not specifically include a peak energy sensing photo transducer.
(7) There are three color coordinates called hue, lightness and saturation. Three degrees of freedom are required to categorize all combinations of color attributes. Two points in a two dimensional space can be connected by a line. Combinations of positions of these two points in space can produce at least three families of lines in the two dimensional space. The line families are horizontal, vertical and sloped. FIG. 4A shows a two dimensional graph of the responses of two normalized photo transducers. A straight line on the graph may represent the two output values of the normalized photo transducers for a specific input light condition. The graph coordinates are light wavelength for the horizontal axis and signal in a natural log scale for the vertical axis. Output values of the two normalized photo transducers can be represented by three families of lines.
(8) The response xe2x80x9ccurvexe2x80x9d of a normalized photo transducer output signal for a normalized light energy input is shown as a straight line, from the maximum response at its wavelength, down to the bottom at the opposite side of the graph. Each response curve of the normalized photo transducers has opposing slopes that cross each other. A normalized photo transducer response over the waveband is given as TR(b)=cexe2x88x92kx, where: x equals the wavelength position in the normalized waveband relative to the maximum response of the photo transducer, i.e., (0 to 1); c, the conversion constant, equals one for a normalized light energy, or, alternately, an integrated CGA value; k equals approximately 10; and b is the high or low transducer. The output signal level is symbolized by E1 for the low wavelength normalized photo transducer and E2 for the other.
(9) A broad constant energy spectrum of visible light relative to its color temperature xe2x80x9cflattensxe2x80x9d its spectral energy curve as the color temperature increases from a deep red at 1000xc2x0 K. to a xe2x80x9cslightly bluishxe2x80x9d white at 10,000xc2x0 K. Thus, when the peak energy photo transducers normalize the retina""s response, the results are equivalent to xe2x80x9cwhiteningxe2x80x9d the pixel""s responses in the waveband of sensible colors. In other words, possibly different energies near the wavelength of the maximum sensitivities of the peak energy transducers contain approximately equal spectral energies at the output of the respective controlled gain amplifiers. This process develops a color constancy in ambient lights of different color temperatures.
(10) A family of horizontal lines can represent the normalized photo transducer responses to a broadband family of white light from bright through gray to dark. Example 1 on the graph is a representation of this family. A family of vertical lines can represent a family of wavelengths in the waveband. Example 2 on the graph is a representation of the wavelength of a monochromatic light source. Families of sloped lines, from a horizontal position to a vertical position, closely represent a morphing from xe2x80x9cwhitexe2x80x9d to a monochromatic light. A change from white light to a light of a pure color is along the axis for the color attribute of saturation. Example 3 on the graph is a representation of a pastel color. The three families of lines are closely mapped to the three color coordinates of hue, lightness and saturation, but not with an exact one to one correlation. A combination of either set of three dimensions of color attributes can be mapped into the other. The two response values of a normalized pixel can represent a line that can move in combinations of the three coordinate ways to represent exact changes in lightness, hue and saturation of colors.
(11) The output values of a normalized pixel, in response to a monochromatic light, shall exhibit proportional photo transducer output values of E1 and E2 that are relative to the wave length of the light in the waveband between the two photo transducers. In the case where there is a broad spectrum of light illuminating an object, the different reflective bands of light relative to the wavelength responses of the normalized pixel will produce photo transducer output values in proportional to values that would be generated by a colored light of the perceived color.
(12) Changing the pixel""s response from straight lines to curved lines on the logarithmic scale does not change the two point families of lines but it will change the form of the mapping between the two different color attributes.
(13) There is another control mode that increases the dynamic range of the sensibility to light of all photo transducers in the retina. This control sums the energy of all spectral responses to adjust an iris to maintain a constant energy to the retina under varying environmental lighting intensities.
(14) This bichromatic theory projects that human color vision may not be as commonly believed. The human retina contains three color cones to sense three different wavelengths of light, which may be used as two color pairs such as a blue-green pair and a red-green pair. Each color pair is processed in the visual cortex to map colors that can be associated to the visual space of an object in a scene. The two color pairs and processing will produce a wide range of colors sensed and a wide range of color constancy. Edwin H. Land""s pre-1959 experiments using two black and white transparencies and two color filters produced a perception of color. The color perception and constancy occur because the brightest area of one of the projected transparencies normalizes the response of the appropriate set of human color cones to the specific color projected and the same occurs for the other transparency. The normalized human retina now sees varying ratios of brightness (energy) over the visual scene, which produces the perception of colors of light for the specific color temperatures of natural or artificial light. The bichromatic theory of color is an integration of the above fourteen theorems that together define the workings of color perception and color constancy.
It is therefore an object of the invention to provide a new and improved neural network color sensor.
It is a further object to provide a neural network color sensor in which the weighting functions may be determined a priori.
Another object of the present invention is to provide a neural network color sensor, which can be trained with a single application of an input data set.
In brief summary, the color sensor generates color information defining colors of an image, comparison of colors illuminated under two or more light sources and boundaries between different colors. The color sensor includes an input section, a color processing section, a color comparison section, a color boundary processing section and a memory processing section. The input section includes an array of transducer pairs, each transducer pair defining one of a plurality of pixels of the input section. Each transducer pair comprises at least two transducers, each generating an output having a peak at a selected color, the selected color differing as between the two transducers, and each transducer having an output profile comprising a selected function of color. The color processing section includes a plurality of color pixel processors, each receiving the outputs from the two transducers comprising the transducer pair associated with a pixel. In response, the color processing section generates a color feature vector representative of the brightness of the light incident on the pixel and a color value corresponding to the ratio of outputs from the transducers comprising the transducer pair associated with the pixel. The color boundary processing section generates a plurality of color boundary feature vectors, each associated with a pixel, each representing the difference between the color value generated by the pixel color processor for the respective pixel and color values generated by the pixel color processor for pixels neighboring the respective pixel.
The color boundary sensor produces object shape feature vectors from a function of the differences in color. This color boundary sensor can sense a colored object shape in a color background where a black and white sensing retina could not detect differences in lightness between the background and the object. The color comparator processor can measure and compare the reflective color of two objects, even when each object is illuminated by two lights of different color temperatures. The memory processor section provides a process to recognize a color, a boundary of color and a comparison of colors.