The present disclosure relates, in general, to methods for finding targets and objects in an image.
As described in currently pending U.S. patent application Ser. No. 12/107,092 filed Apr. 22, 2008, in which the present inventors are named co-inventors, the region of interest (ROI) is the image currently being examined or possibly a subset of the image that is currently being examined. Within the ROI, the aforementioned method divides the pixels into Rich Color and non-Rich Colors. Rich Colors are further divided into Target Colors (these are limited to Rich Colors) and non-Target Colors. Target Colors are those Rich Colors that are of specific interest to the viewer at the time of examination. Rich, non-Rich, Target and non-Target Colors are reduced to binary representations, meaning that a type of color is represented as either a “1” or a “0”. As described in the previous patent application, the use of configurable thresholds allows the user to simply control whether certain pixels are evaluated as target or non-target. Pixels which have been evaluated as being target pixels and that are adjacent to each other in the image are referred to as Rich Target patches. Pixels which are evaluated as being non-Target pixels can be used as a mask to restrict the area under examination reducing the amount of processing power and time required to process the ROI.
Typically, Blob analysis is used to find targets in a filtered image. Blob analysis is compute-intensive for complex images. The speed of the analysis is greatly reduced if the filtered image has image noise or a large amount of irrelevant objects. Roughly speaking, the time to process a set of blobs increase geometrically. This puts pressure on the user to better filter the image and to limit the size of the blobs to be tested such that there is an upper and lower limit to given characteristics. Further there is a need to restrict the search area to where it is most likely, i.e. near its last location. The targets also have limits as to out of plane angle that they can be tilted.
Traditionally developers of vision systems have sought to place limits on the targets, and the operational environments to improve the odds of search procedures. However, these limits greatly reduce the usefulness of the Blob method in everyday consumer applications. Ordinary users cannot be counted on to limit the size or distance of a target. They often move a target in and out of the cameras field of view ruining any chance of limiting the search region to a small fraction of the image. Consumer applications might have one target or ten or more. The number of targets could vary from frame to frame. Consumer applications are a nightmare. Our method does have a drawback also. This method can perform filtering in-line with Target detection. Consumer applications require inexpensive cameras and computing equipment that rapidly and reliably tracks in normal lighting or poorly lit rooms with uneven lighting and in the hands of novice operators with little patience.
Machine vision, commonly called automated inspection, has been used in manufacturing processes to improve productivity and quality. On a typical production line, a sensor detects a part and signals a video camera positioned above or to the side of the inspection point to capture an image and send it to a machine vision processor. Using a combination of machine vision software and hardware the vision system analyzes the image and provides mathematical answers about the part. A traditional gray scale machine vision technology makes decision based on 0-256 shades of gray. A typical vision algorithm segments an image into pixels that fall within an intensity band bounded by a lower and upper threshold from the irrelevant pixels that have intensities outside of this intensity band. Alternatively they look at the rate of change of the image pixels. Once the relevant pixels have been identified, adjacent pixels are clumped together to form blobs and these are then characterized by geometric characteristics such as location, size, shape, etc. Inspecting colored parts or objects with gray-scale machine vision systems becomes usually unreliable in many cases and impossible in others. For this reason use of a color machine vision technology is needed to inspect parts or objects in ways that could not be done using traditional gray scale machine vision systems.
Thus far, color machine vision systems have been used for three primary vision applications:
Color Matching—verifying that a certain part's or object's color matches what the vision system is programmed to find:
Color sorting—sorting parts or objects based on color.
Color Inspection—inspecting colored parts or objects for defects or imperfections that gray scale image processing tools can't detect.
Defined as the perceptual result of visible light reflected from an object to human eyes, color represents an interpretive concept. Depending on how light is reflected, all humans see colors a bit differently. Human visual system use color to draw conclusions about surfaces, boundaries, location, and relative location to other objects, orientation, movement and changes in movement of objects in a scene. The human eye is usually capable of discerning both the color of objects under inspection and discerning the Transition Curves of said objects. Both Transition Curves and colors are used in building the “scene” that the brain uses to identify what is being viewed and to then make an interpretation of the meaning of what is seen.
Machine vision systems have typically reduced color information to one of 255 colors of gray scale in order to simplify processing. An undesirable byproduct of this simplification process is often the loss of important information which often reduces the utility of the inspection.
Color derives from the spectrum of light (distribution of light energy versus wavelength) interacting in the eye with the spectral sensitivities of light receptors. Typically, a wavelength spectrum from 380 nm to 740 nm (roughly) of light is detectable by human eye. This range is known as the visible light. The pure “spectral colors” from a continuous spectrum can be divided into distinct colors: violet (380-440 nm), blue (440-485 nm), cyan (485-500 nm), green (500-565 nm), yellow (565-590 nm), orange (590-625 nm), and red (625-740 nm). However, these ranges are not fixed, the division is a matter of culture, taste, and language. For instance, Newton added a seventh color, indigo, as wavelengths of 420-440 nm between blue and violet, but most people are not able to distinguish it. Of course, there are many color perceptions that by definition cannot be pure spectral colors. Some examples of non-spectral colors are the “achromatic colors” (black, gray, and white) and colors such as pink, tan, and magenta.
An additive color system involves light “emitted” from a source or illuminant of some sort such as TV or computer monitor. The additive reproduction process usually uses red, green, and blue which are the “primary colors” to produce the other colors. Combining one of these primary colors with another in equal amounts produces the “secondary colors” cyan, magenta, and yellow. Combining all three primary lights (colors) in equal intensities produces white. Varying the luminosity of each light (color) eventually reveals the full gamut of those three lights (colors).
Results obtained when mixing additive colors are often counterintuitive for people accustomed to the more everyday subtractive color system of pigments, dyes, inks, and other substances which present color to the eye by “reflection” rather than emission. Anything that is not additive color is subtractive color.
Light arriving at an opaque surface is either “reflected”, “scattered”, or “absorbed” or some combination of these. Opaque objects that do not reflect specularly (that is, in a manner of a mirror) have their color determined by which wavelengths of light they scatter more and which they scatter less. The light that is not scattered is absorbed. If objects scatter all wavelengths, they appear white. If they absorb all wavelengths, they appear black. Objects that transmit light are either translucent (scattering the transmitted light) or transparent (not scattering the light).
The color of an object is a complex result of its surface properties, its transmission properties, and its emission properties, all of which factors contribute to the mix of wavelengths in the light leaving the surface of an object. The perceived color is then further conditioned by the nature of the ambient illumination, and by the color properties of other objects nearby, and finally, by the permanent and transient characteristics of the perceiving eye and brain.
Light, no matter how complex its composition of wavelengths, is reduced to three color-components by the eye. For each location in the visual field, the three types of color receptor cones in the retina yield three signals based on the extent to which each is stimulated. These values are sometimes called “tristimulus values”.
To analyze and process images in color, machine vision systems typically use data from color spaces such as RGB, HSI (or HSL), HSV (or HSB), CIELAB (or CIEXYZ) CMYK, etc. Individual color within each of these spaces is sometimes referred to as a color component. The original color components can be scaled individually in either a linear or non-linear fashion before proceeding with this method emphasize given target characteristics or to take compensate for lighting or camera problems or even to implement the Rich Color Filter in a more efficient fashion.
In the RGB color space, each color appears in its primary spectral components of red, green, and blue. When combined with a three-dimensional coordinate system, the RGB color space defines quantitatively any color on the spectrum. RGB uses “additive” color mixing. X-axis specifies the amount of red color, Y-axis specifies the amount of green and the Z-axis Specifies the amount of blue. If RGB color model is implemented in 256 (0 to 253) discrete levels of each color component (8 bits) then the color space defines a garmut of 256×256×256 or about 16.7 million colors.
The HSI color space, also known as HSL, is broken down into hue, saturation and intensity or lightness. Hue refers to pure color, saturation refers to the degree or color contrast, and intensity refers to color brightness.
HSV (hue, saturation, value), also known as SHB (hue, saturation, brightness), is quite similar to HSL “brightness” replacing “lightness”. Artists often use HSV color space because it is more natural to think about a color in terms of hue and saturation.
CIE 1931 XYZ color space is the first attempt to produce a color space based on measurements of human color perception. It is the most complete color space used conventionally to describe all the colors visible to human eye. It was developed by the “International Commission on Illumination” (CIE). CIE 1976 LAB is based directly on CIE 1931 XYZ color space as an attempt to make the perceptibility of color differences linear. CIE is the most accurate color space but is too complex for everyday uses.
CMYK uses subtractive color mixing in used printing process. It is possible to achieve a large range of colors seen by humans by combining cyan, magenta, and yellow transparent dyes/inks on a white substrate. Often a fourth black is added to improve reproduction of some dark colors. CMYK stores ink values for cyan, magenta, yellow, and black. There are many CMYK color spaces for different sets of inks, substrates, and press characteristics.
Although dozens of defined color spaces exist, color machine vision applications primarily have used RGB and HSI or HSV color spaces.
Prior art systems use various techniques to measure and match colors such as a color sorting method for wires by comparing the output signal of a camera to the intensity ratio of known colors until a substantial match is found.
Another technique provides a color sorting system and method used for sorting fruits and vegetables. The sorting process is handled with a look up table. The pixel value of the input image is sent to the look up table and the output from the look up table is either series of 0's (accept) or 1's (reject).
Another method for automatically and quantitatively measuring color difference between a color distribution of an object and a reference color image uses “color distance” in a color system. A template representing the reference color image is stored in a memory of a machine vision system. The machine vision system generates a sample color image of the object and processes the template together with the sample color image to obtain a total color distance.
An apparatus is known for sorting fragments of titanium-based sponge on the basis color by comparing the color values of the image to a set of data values stored in a look up table for rejection or acceptance of each fragment.
Another system and method for locating regions in a target image matches a template image with respect to color and pattern information either by using a hill-climbing technique or fuzzy logic.
A different system and method of perceptual color identification can be used for the identification and tracking of objects, for example, in a surveillance video system. The described method includes a multilevel analysis for determining the perceptual color of an object based on observed colors. This multilevel analysis can include a pixel level, a frame level, and/or a sequence level. The determination makes use of color drift matrices and trained functions such as statistical probability functions. The color drift tables and function training are based on training data generated by observing objects of known perceptual color in a variety of circumstances.
It is clear from the prior art that traditional gray scale machine vision systems are being used successfully in a wide variety of inspection and process control applications for the electronic, automotive, food products, packaging, pharmaceutical, and recycling industries.
However, the use of color machine vision systems in these industries has only been applicable to well controlled immediate environments or surroundings. As machine vision is normally practiced, it is best to have the environment controlled in order to achieve predictable, high-quality results, for example:
Good lighting. The lighting should be consistent across the entire area being observed, without tints or shades (so that the true color of the objects under observation could be determined). When the case at hand requires that the ROI be observed multiple times the lighting should be maintained consistently over periods of time which could vary from milliseconds to months or years.
A simple, controlled, predictable background. Most systems require that the background of the objects under inspection be controlled and known in advance. For instance, a system designed to identify defective parts worked, in part, because the system knew it was always examining the parts against the consistent background of a black conveyor belt.
Control of items that are in the immediate area where the images are being gathered. Items positioned adjacent to the imaging area, or that move into or out of the imaging area during the time the images are being captured may affect the data that is captured by the cameras. This could be due to the uncontrolled items changing the amount or quality of light that is being reflected into or being absorbed adjacent to the imaging area or the color of light being reflected into the imaging area. For instance, if a person wearing a red shirt were to walk next to an imaging area during an image capture they could introduce additional red to the data gathered and potentially change how the data is interpreted.
Cameras (or other data gathering devices) that are of high quality, good color resolution, and that will produce repeatable data under similar conditions.
The cameras should be configured correctly (as to focus, shutter speed, aperture, sensitivity etc.).
The image should be captured with minimal motion blurring. This implies either that the items of interest should be held nearly motionless or that the camera(s) must be “fast” enough to capture a “frozen” image.
The orientation of the objects being examined should be known or controlled or compensated for along six axes—X, Y and Z along with roll, pitch and yaw. Lack of control of any of these, or lack of an ability to somehow determine these after the image is taken may render the image less useful. For instance, if a camera captures an image of a red box that is exactly four pixels high and four pixels wide it can make different deductions about the item depending on what it “knows” about that box. If it is known that the box is one mile away then one can reasonably estimate how tall and how wide the box is. Conversely, if it known that the box is 10 feet tall and the image is four pixels by four pixels then the system can make a reasonably accurate estimate regarding how far away the box is.
Blob analysis is the most widely used tool to find and characterize objects and targets in an image. However, Blob analysis is very slow and requires large amounts of computer memory. The more complex the image or the more filter noise the slower the search. Finally, a successful blob analysis search requires that the size of the blob in the image does not vary widely.
These limitations make it difficult to build successful applications that use color cameras to control applications on common consumer devices such as cell phones or tablets where the user is a casual user, the equipment is inexpensive and the environment is often uncontrolled. Often the more unsophisticated the user, the more demanding they are that technical products be fast, reliable, and simple to use with few if any restrictions.
It is also clear that prior art relied on matching color to a reference color image or template. A color machine and computer vision system that can make robust identification of color under varying lighting and changing image shift, scale, and rotation conditions is desirable. Machine vision systems use specialized and expensive hardware and software and therefore their use has been limited to industrial applications. With the advance of inexpensive color webcams, it is also desirable to find use for computer vision systems in cost sensitive consumer applications.
It would be desirable to provide a method to replace blob finding methods with a faster, more reliable method with fewer limitations for locating and identifying targets and objects in the field of view of a color camera in ordinary lighting environments, to thereby obviate for tracking purposes the prior art use of providing powered light sources in the target or in an illuminating source with specific directional or color characteristics.
It would also be desirable to provide a method that was so efficient that it could be implemented inside standard inexpensive cameras and that would transmit small packets of target data to a control center computer only when a target was identified.
It would also be desirable to provide an improved method of tracking objects that are tilted at extreme angles relative to the camera plane.
It would also be desirable to provide an improved method of tracking targets or objects through time or three dimensions.
It would also be desirable to track targets with patterns of Rich Colored patches.
It would also be desirable to provide an improved method for triggering interaction or applications between a user and computer by identifying, and locating targets or objects with Rich Colored patches in an image in the field of view of a camera or cameras.
It would also be desirable to provide an improved method for filtering such that everything in an image except for patches of different Rich Colors that were adjacent to each other was ignored.
It would be desirable to provide a machine or computer vision system that could be used in camera driven applications using ordinary smart phones, tablet computers, etc. that were fast inexpensive and reliable for unsophisticated users. It would also be desirable to provide a new method that is capable of incrementally processing a stream of data rather than requiring the processing of an entire ROI, that reduces or eliminates noise with little or no additional overhead, that discovers, assembles and stores geometric information that allows the determination of location, size, orientation, speed and acceleration of a target, and that is capable of this performance with a single target or multiple targets including tracking the position of multiple targets relative to each other within one ROI without severe degradation.