Both natural light (‘ambient’) photography and flash-assisted (read broadly: ‘human assisted light supplementation’) photography have been around since the Daguerreotype. The technology of this disclosure concerns how primarily the latter form of lighting, call it ‘flash’ for conciseness, can be so designed and implemented as to effectively qualify it within the general art of ‘imaging spectrometry’ or ‘hyper-spectral imaging.’
In a nutshell, by illuminating a scene with several different brief (frame-synchronized) ‘spectrally structured’ light sources, even a common Bayer pattern CMOS camera can effectively become an imaging spectrometer with ‘N bands,’ N in very early days being practically on the order of 5 to 10 bands, but with fine prospects of going higher, especially as design principles behind Bayer patterns (and RGBW, e.g., from Sony) are reconsidered in light of this technology.
An introduction of the technology must make note of multi-chip LEDs (see e.g. Edison's 2012-era Federal FM series, depicted in FIG. 7) as being at least a seed for creating ‘spectrally structured light.’ A core approach, exploited in several embodiments, is to synchronize pulsing of different LED light sources with individual frames of a CMOS sensor, thereby creating the informational basis for N-band imaging. Light sources other than LEDs can certainly be considered but by 2012 standards, multi-chip and/or ‘dual’ LEDs are leading candidates to realize this technology.
A particularly intriguing choice of ‘bands’ is the 3 very well-known 1931 CIE color matching functions and/or their orthogonally transformed functions. With such choices, the stage is set for taking color photography to its multiverse destiny: referred to as ‘direct chromaticity capture’ in this disclosure.
One part of this disclosure describes the design principles and physical realizations of turning virtually any electronic imaging sensor into an imaging spectrometer via specific coordination with some supplemental light source. With the core ‘how’ then elucidated, applications are presented and described, including A) the niche application of hyper-spectral imaging, B) the medical imaging potential of this technology, C) radically improved color photography for both ‘digital cameras’ and smart phones (as 2012 still draws pretty sharp lines between the two), and D) uses of N-band imaging within the mature technology of digital watermarking and ‘image fingerprinting.’
Subsequent to the initial disclosure, this disclosure has been expanded significantly in several areas, including:                methods and systems for classifying and recognizing various types of objects;        such systems employing various imaging configurations, with various options on spectral light sources, optical filters, polarimetric sensing, sensing of these spectral and polarimetric pixel samples at 3 spatial dimensions (including plenoptic sensing and structured light 3D sensing), scanning techniques, and synchronizing controlled capture under various lighting and sensing states;        training and applying classifiers for particular fields, including produce identification, produce ripening, etc.;        advances in illumination, sensing and post processing to address various environmental effects, including specular reflections, product package layers (e.g., plastic packaging or bags that hamper object identification); and        advances in sensing and post processing, prior to training and applying a classifier to obtain vectors per pixel, that combine spectral, polarimetric, and spatial relationships among pixel elements.        
Many more system configurations, lighting and sensing devices, and pixel post processing techniques and device configurations are detailed further below. A myriad of inventive combinations of these and other aspects of the disclosure are contemplated and not limited to the particular example embodiments. We provide source code samples as examples. It is contemplated that the various signal processing described may be implemented as software instructions for execution on general purpose computing devices or special purpose processors, including devices with DSPs, GPUs, etc. These software instructions may be ported into processor device specific firmware versions, ASICs, FPGAs, etc. in various combinations, as well as leverage cloud computing services for execution (particular for training, classifying and recognition services).
The foregoing and other features and advantages of the present technology will be more readily apparent from the following Detailed Description, which proceeds with reference to the accompanying drawings.
Classifiers for Produce
Several research groups have investigated methods using digital color (Red, Green, and Blue) cameras to classify fruits or fruits and vegetables. One was made by IBM in the late 1990s. See, Bolle, Connell, Hass, Mohan, Taubin. “VeggieVision: A Produce Recognition System”, “Proceedings of the Third IEEE Workshop on Applications of Computer Vision, pp. 224-251, 1996. For this effort, the researchers tried to classify 48 different produce items. They used a combination of color and texture features. Color features were three concatenated histograms of the produce item, computed in the Hue-Saturation-Intensity (HSI) space. For texture measure, they tried a couple different gradient measures. The texture features were histograms of the gradient taken over the image. Both gradient measures performed similarly. They used a nearest neighbor classifier. The correct classification was one of the top four predicted classes 90% of the time for color only (with hue being most important), 63% of the time for texture only, and 97% of the time for color and texture. This result indicates that good category separation should be possible with a fast simple classifier operating on a single feature vector per image.
Several more recent publications by university researchers provide guidance on potential color and texture features for grouping produce into categories. A group in Brazil working with Cornell performed a study of a variety of features and classifier types using a set of 15 different produce items. See, Rocha, Hauagge, Wainer, Goldenstein. “Automatic fruit and vegetable classification from images”, Computers and Electronics in Agriculture, 70, 96-104, 2010. The images showed one or more examples of each item against a uniform white background. A digital RGB camera was used to capture the images. Their color and texture descriptors included:
1. General Color Histogram. A color histogram is a 3 dimensional matrix that measures the probability of each RGB vector, rather than building three separate histograms, one for each color. Typically, each color is quantized to 4 levels to create a 4×4×4=64 element feature vector.
2. Unser Features. Unser features are a texture measure that operates on the intensity channel. It involves taking the sum and difference of pairs of pixels at a selected scale. Histograms are then formed for the sum and difference images.
3. Color Coherence Vectors. Color coherence vectors are frequently used in image searches of the type “find other pictures like this one”. They are comparable to the color histogram in terms of classification power.
4. Border/Interior Color Histogram. This method uses two color histograms, one for pixels on the interior of regions and one for pixels on the edges of a region. This metric captures both color and texture information, and is the best of the features explored in this work.
5. Appearance descriptors. This feature matches small regions of the intensity image to a set of appearance (edge/texture) descriptors that are similar to the Haar features used for face detection. This feature set performed poorly and its evaluation was dropped early in the paper.
The researchers investigated a number of classifier methodologies, with one-versus-one Support Vector Machines (SVM) being the clear winner. Using the Border/Interior color histograms, the classification matched one of the top two 95.8% of the time and using a combination of features, they were able to bring top two correct classification up to 97%.
An Indian university group using the same data set performed a different set of experiments, but with less success. See, Arivazhagan, Shebiah, Nidhyanandhan, Ganesan. “Fruit Recognition using Color and Texture Features”, Journal of Emerging Trends in Computing and Information Sciences, 90-94, 2010. They used a co-occurrence histogram on low pass filtered intensity values to measure texture. Rather than use the histogram directly, they computed several statistics, including contrast, energy, and local homogeneity, and used these statistics as features. Similarly they computed histograms on hue and saturation for color measurement and derived statistics from those histograms. Their final feature vector had 13 statistical features. Color statistics performed particularly poorly, with only 45% correct classification. The texture feature was better with 70% average correct classification. Combining the features worked best, giving 86% correct classification. This work indicates that while color histograms are effective at capturing important produce characteristics, reducing the histograms to statistics is less effective.
Most recently, a group in China performed an independent study similar to that performed by Rocha, et. al. on a set of 18 fruits (no vegetables). See, Zhang and Wu. “Classification of Fruits Using Computer Vision and a Multiclass Support Vector Machine”, Sensors, pp. 12489-12505, 2012. They used several variants of SVMs and a combination of color, texture, and shape features. The color feature was a color histogram. They used the Unser feature vector, but reduced the pair of histograms to seven features using statistical measures (mean, contrast, homogeneity, energy, variance, correlation, and entropy). They also made eight shape measurements including area, perimeter, convex hull area, and minor and major axis of a fitted ellipse. Unfortunately, they performed no analysis of the relative value of each feature type (color, texture, shape), so it is difficult to ascertain the effectiveness of their different features. It would have been particularly useful to understand which, if any, of the shape features provided discriminability. They performed PCA on the feature set, reducing it from dimension 79 to dimension 14. The researchers performed tests using one-versus-All and one-versus-one classifiers, with the one-versus-one approach the clear winner. Their classifiers had 53.5% classification correctness using a linear SVM and 88.2% correct using a radial basis function (RBF) SVM. The PCA operation may be partially responsible for the relatively poor performance of the linear classifier. The reduction of the Unser features to statistics may have also had a negative effect on classification accuracy.
A quick clarification on what constitutes the classification performance minimum: With two equal sized classes, you can get 50% correct by “flipping a coin” to select the class. However, when there are more than two classes, 50% is no longer your misclassification floor. For three classes the floor is 33%, for four classes 25%, for 20 classes 5%, and so on.
The following provides additional descriptions of selected figures:
FIG. 1 shows, at 70, a classic “Bayer Pattern,” typifying the color filter arrangements of the individual pixels of a modern CMOS camera. Below is shown part of a 2012-era smartphone 40, with a CMOS camera aperture 50, and an LED flash aperture 60. Also shown are two apples, a red apple 20 and a green apple 30, respectively reflecting red and green light from the sun 10 (which produces “white light” ambient illumination).
FIG. 3 shows how the spectral reflectance profile, 90, of the green apple might nicely mimic the Bayer-pixel spectral profile of the “G” channel. In the lower left, the “G” channel pixels “light up” whilst imaging the green apple 110. Likewise, the spectral reflectance profile 100 of the red apple might nicely mimic the Bayer-pixel spectral profile of the “R” channel. In the lower right, the “R” channel pixels “light up” when imaging the red apple 120.
FIG. 4 concerns the fact that a scene is effectively never illuminated with strictly “white light.” There is always a “structure” to the light spectral curve—illustrated in very simple fashion in this figure. In particular, curve 130 shows the “actual” but largely “unknown” ambient lighting spectral profile of a scene (the apples).
FIG. 5 illustrates a hypothetical “slight green-ish, mainly blue-ish” light source, 140, giving rise to “lighting modified” effective spectral response curves B′, 140, G′, 160 and R′ 170.
FIG. 6 shows how the red apple will “look” yellowish, 180—a pretty even combination of green and red—under the lighting conditions of the previous figure, all because of the different lighting and nothing to do with the sensors. The “effective” profiles B′, G′ and R′ all get shaped by the knowable characteristics of the lighting.
FIG. 7 shows that the “standard white” LEDs found in existing camera phone flashes can be replaced with so-called “Multichip LEDs,” with the Edison Corporation Federal FM series model here depicted (190).
FIG. 8 shows how all of this, to the human eye, looks like a pseudo-strobe kind of white light illumination since it is cycling so quickly. In particular, starting with the top, coordinated with frame 4*n (n continuously increasing), one of the LED flashes for typically 1/30th of a second, 200, for example with a yellow-ish light, yet well known spectrally. Below, sensor frame 4*n+1 then coordinates with another LED flashing for 1/30th of a second, 210, this time with a red-ish looking light, again with well known spectral characteristics. Then below, frame 4*n+2 witnesses a purplish LED flash, 220, tending more toward the bluish and green side of the spectrum. Finally, at the bottom, frame 4*n+3 has a mauvish LED flash with its exposure time of 1/30th of a second, completing the flash cycle and then incrementing “n” to go back to the top for movies, or stop for a single “image” capture (i.e., n=1 and only 1 for a single image).
FIG. 11 illustrates how some small patch on the red apple, 320, corresponding to a Bayer cell, 330A-D, thus has effectively 12 different “spectral samplings” measured over four frames of image data, corresponding to B0, B1, B2, B3, G0, G1, G2, G3, R0, R1, R2 and R3. The Bayer cell is the same physical cell for all four frames, but with different lighting they have different effective spectral sampling profiles.
FIG. 12 examines how this sequence of digitized pixel values lets us try to measure the “unknown” spectral reflection function of the patch of apple being imaged, including a hypothetical “actual” spectral reflectance function 340 of the patch of apple 320.
FIG. 13 concerns generic linear functional estimation. The left side shows typical examples of orthogonal discrete functions often used to parameterize (fit) unknown distributions (the apple's true reflectance spectrum 340 in our example). The lower right shows that “smooth” functions can similarly be used, a la Chebyschev Polynomials.
FIG. 14 shows a decent “5-rectangular band” Bayer-tuned Solution Set, with 80 nm, 50 nm, 40 nm, 50 nm and 80 nm bandwidths, respectively.
FIG. 15 shows a 5-band “Orthonormal” set of imaging spectroscopy bands, weighted for direct multiplication with the lighting-modified effective spectral response curves associated with B0-B3, G0-G3 and R0-R3.
FIG. 16 shows largely empirical coupling value between effective spectral response G0 and all five chosen bands.
Referring to the left of FIG. 17, the “G0” row of the H matrix is calculated via simple area multiplications between an empirical light-source-modified sensor profiles and chosen solution bands (in the case V-Z). On the right, ‘g’ is the twelve pixel value vector (with the redundant green values averaged); H is the coupling matrix, and F is the sought solution. The G0 row vector is explicitly displayed, while the other 11 rows are implicitly filled-in by multiplying their effective response curves by the five orthonormal bands, as per FIG. 16. (The noted sub-script “p” indicates we are solving for our small apple patch.)
FIG. 22 shows various examples of LED spectral characteristics as plotted on the 1931 CIE spectral diagram.
FIG. 24 illustrates that solution bases functions can be many choices and not necessarily “orthogonal” or “orthonormal.” Flash-modified pixel sensitivity functions also need not be Bayer/RGB/etc., as well. Here depicted is how explicit “CIE” solutions can be constructed from “arbitrary” flash-sensor profiles, where multiplication produces row values in our H matrix. Curve 470 shows an arbitrary flash-sensor profile to be multiplied by any chosen solution functions, here depicting “classic” 1931 CIE functions. (The subscript “p” again indicates we are solving for our small apple patch.)
FIG. 25 shows that “Direct Chromaticity Capture” becomes a natural consequence where (a) sensor profiles, (b) LED profiles, (c) “ambient light” treatment, and (d) the raw number of independent flashes . . . can all combine to approach near-full-gamut capture, and ever-tightening error bars on the capture.
FIG. 26 contemplates that there are many ways to deal with “generally unknown” but often very typical kinds of ambient light additions to the pure flash, e.g.:
1) add an estimated ambient profile to ALL weight values in the H matrix;
2) strobe the flash so quickly, with synchronized strobing of the pixel exposure time, that ambient becomes negligible;
3) EXPLOIT IT! Use a pure ambient capture as part of the frame sequencing, giving N-5 in our 4-LED scenario;
4) Use common photographic measuring instrumentation to gauge the color temperature of ambient, then use this in H matrix correction factors;
5) Use “Flash-Frame Intensity Modulation” to cycle the intensity of any/all flashes, measuring the digital number modulation of the resulting pixel values against a “known” lumen modulation applied to a scene;
6) Etc. . . .
FIG. 28 illustrates some of the commercial/consumer applications of the present technology, beyond “richest color” photography, e.g., quick checks on freshness and quality of produce, for both proprietors and consumers alike (281); building and materials inspection (282); and counterfeit products “quick checks” (283).
FIG. 31 illustrates how clip-on accessories are a viable short-cut to market as the long process of designing and integrating new LEDs directly into smart phones. (Depicted is a commercially available optic supplementation, but making this unit primarily a flash unit with either wired or wireless connection to the device is quite viable.)
FIG. 32 illustrates an approach to deal with camera motion and motion photography (video; effectively motion deblurring in luminance, with the additional of chrominance “draping”). This involves dynamic linear luminance tracking (keying-in explicitly to time intervals between ⅕th and 1/10th of a second). At 321, “common” luminance-signal correlation can determine motion between frames, with subsequent re-projection of individual frames onto a shared frame—typically the middle frame. At 322, the same operation can be done on frames of a video; each individual frame can become a reference frame that the other four (in this example) re-project to.
FIG. 35 posits that the LED units are not on, and a camera merely samples the ambient light, producing three datum per each cell of a Bayer sensor.
FIG. 36 is similar to FIG. 35, but here LED 1 is tweaked on and a distance-squared modified L1 term shows up in the collected samples from the Bayer sensor (distance-squared term not explicitly in equations).
FIG. 37 shows that individual LED tweaks can thus be isolated from ambient contributions. Here we see just one LED, number 1, and how we get three “g vector” measurement values that can roll up into matrix equations intending to solve the R coefficients (the unknowns). For surface “patches” involving thousands of pixels and allowing several LED tweak cycles, many otherwise noisy values can nevertheless produce superb patch spectral patch measurements.