1. Field of the Invention
The invention relates to automated detection and inspection of objects being manufactured on a production line, and more particularly to the related fields of industrial machine vision and automated image analysis.
2. Description of the Related Art
Industrial manufacturing relies on automatic inspection of objects being manufactured. One form of automatic inspection that has been in common use for decades is based on optoelectronic technologies that use electromagnetic energy, usually infrared or visible light, photoelectric sensors, and some form of electronic decision making.
One well-known form of optoelectronic automatic inspection uses a device that can capture a digital image of a two-dimensional field of view in which an object to be inspected is located, and then analyze the image and make decisions. Such a device is usually called a machine vision system, or simply a vision system. The image is captured by exposing a two-dimensional array of photosensitive elements for a brief period, called the integration or shutter time, to light that has been focused on the array by a lens. The array is called an imager and the individual elements are called pixels. Each pixel measures the intensity of light falling on it during the shutter time. The measured intensity values are then converted to digital numbers and stored in the memory of the vision system to form the image, which is analyzed by a digital processing element such as a computer, using methods well-known in the art to determine the status of the object being inspected.
In some cases the objects are brought to rest in the field of view, and in other cases the objects are in continuous motion through the field of view. An event external to the vision system, such as a signal from a photodetector, or a message from a PLC, computer, or other piece of automation equipment, is used to inform the vision system that an object is located in the field of view, and therefore an image should be captured and analyzed. Such an event is called a trigger.
Machine vision systems have limitations that arise because they make decisions based on a single image of each object, located in a single position in the field of view (each object may be located in a different and unpredictable position, but for each object there is only one such position on which a decision is based). This single position provides information from a single viewing perspective, and a single orientation relative to the illumination. The use of only a single perspective often leads to incorrect decisions. It has long been observed, for example, that a change in perspective of as little as a single pixel can in some cases change an incorrect decision to a correct one. By contrast, a human inspecting an object usually moves it around relative to his eyes and the lights to make a more reliable decision.
Machine vision systems have additional limitations arising from their use of a trigger signal. The need for a trigger signal makes the setup more complex—a photodetector must be mounted and adjusted, or software must be written for a PLC or computer to provide an appropriate message. When a photodetector is used, which is almost always the case when the objects are in continuous motion, a production line changeover may require it to be physically moved, which can offset some of the advantages of a vision system. Furthermore, photodetectors can only respond to a change in light intensity reflected from an object or transmitted along a path. In some cases, such a condition may not be sufficient to reliably detect when an object has entered the field of view.
Some prior art vision systems used with objects in continuous motion can operate without a trigger using a method often called self-triggering. These systems typically operate by monitoring one or more portions of captured images for a change in brightness or color that indicates the presence of an object. Self-triggering is rarely used in practice due to several limitations:                The vision systems respond too slowly for self-triggering to work at common production speeds;        The methods provided to detect when an object is present are not sufficient in many cases; and        The vision systems do not provide useful output signals that are synchronized to a specific, repeatable position of the object along the production line, signals that are typically provided by the photodetector that acts as a trigger and needed by a PLC or handling mechanism to take action based on the vision system's decision.        
Many of the limitations of machine vision systems arise in part because they operate too slowly to capture and analyze multiple perspectives of objects in motion, and too slowly to react to events happening in the field of view. Since most vision systems can capture a new image simultaneously with analysis of the current image, the maximum rate at which a vision system can operate is determined by the larger of the capture time and the analysis time. Overall, one of the most significant factors in determining this rate is the number of pixels comprising the imager.
The time needed to capture an image is determined primarily by the number of pixels in the imager, for two basic reasons. First, the shutter time is determined by the amount of light available and the sensitivity of each pixel. Since having more pixels generally means making them smaller and therefore less sensitive, it is generally the case that increasing the number of pixels increases the shutter time. Second, the conversion and storage time is proportional to the number of pixels. Thus the more pixels one has, the longer the capture time.
For at least the last 25 years, prior art vision systems generally have used about 300,000 pixels; more recently some systems have become available that use over 1,000,000, and over the years a small number of systems have used as few as 75,000. Just as with digital cameras, the recent trend is to more pixels for improved image resolution. Over the same period of time, during which computer speeds have improved a million-fold and imagers have changed from vacuum tubes to solid state, machine vision image capture times generally have improved from about 1/30 second to about 1/60 second, only a factor of two. Faster computers have allowed more sophisticated analysis, but the maximum rate at which a vision system can operate has hardly changed.
The Vision Detector Method and Apparatus teaches novel methods and systems that can overcome the above-described limitations of prior art machine vision systems. These teachings also provide fertile ground for innovation leading to improvements beyond the scope of the original teachings. In the following section the Vision Detector Method and Apparatus is briefly summarized, and a subsequent section lays out the problems to be addressed by the present invention.
Vision Detector Method and Apparatus
The Vision Detector Method and Apparatus provides systems and methods for automatic optoelectronic detection and inspection of objects, based on capturing digital images of a two-dimensional field of view in which an object to be detected or inspected may be located, and then analyzing the images and making decisions. These systems and methods analyze patterns of brightness reflected from extended areas, handle many distinct features on the object, accommodate line changeovers through software means, and handle uncertain and variable object locations. They are less expensive and easier to set up than prior art machine vision systems, and operate at much higher speeds. These systems and methods furthermore make use of multiple perspectives of moving objects, operate without triggers, provide appropriately synchronized output signals, and provide other significant and useful capabilities that will be apparent to those skilled in the art.
One aspect of the Vision Detector Method and Apparatus is an apparatus, called a vision detector, that can capture and analyze a sequence of images at higher speeds than prior art vision systems. An image in such a sequence that is captured and analyzed is called a frame. The rate at which frames are captured and analyzed, called the frame rate, is sufficiently high that a moving object is seen in multiple consecutive frames as it passes through the field of view (FOV). Since the objects moves somewhat between successive frames, it is located in multiple positions in the FOV, and therefore it is seen from multiple viewing perspectives and positions relative to the illumination.
Another aspect of the Vision Detector Method and Apparatus is a method, called dynamic image analysis, for inspecting objects by capturing and analyzing multiple frames for which the object is located in the field of view, and basing a result on a combination of evidence obtained from each of those frames. The method provides significant advantages over prior art machine vision systems that make decisions based on a single frame.
Yet another aspect of the Vision Detector Method and Apparatus is a method, called visual event detection, for detecting events that may occur in the field of view. An event can be an object passing through the field of view, and by using visual event detection the object can be detected without the need for a trigger signal.
Additional aspects of the Vision Detector Method and Apparatus will be apparent by a study of the figures and detailed descriptions given therein.
In order to obtain images from multiple perspectives, it is desirable that an object to be detected or inspected moves no more than a small fraction of the field of view between successive frames, often no more than a few pixels. According to the Vision Detector Method and Apparatus, it is generally desirable that the object motion be no more than about one-quarter of the FOV per frame, and in typical embodiments no more than 5% or less of the FOV. It is desirable that this be achieved not by slowing down a manufacturing process but by providing a sufficiently high frame rate. In an example system the frame rate is at least 200 frames/second, and in another example the frame rate is at least 40 times the average rate at which objects are presented to the vision detector.
An exemplary system is taught that can capture and analyze up to 500 frames/second. This system makes use of an ultra-sensitive imager that has far fewer pixels than prior art vision systems. The high sensitivity allows very short shutter times using very inexpensive LED illumination, which in combination with the relatively small number of pixels allows very short image capture times. The imager is interfaced to a digital signal processor (DSP) that can receive and store pixel data simultaneously with analysis operations. Using methods taught therein and implemented by means of suitable software for the DSP, the time to analyze each frame generally can be kept to within the time needed to capture the next frame. The capture and analysis methods and apparatus combine to provide the desired high frame rate. By carefully matching the capabilities of the imager, DSP, and illumination with the objectives of the invention, the exemplary system can be significantly less expensive than prior art machine vision systems.
The method of visual event detection involves capturing a sequence of frames and analyzing each frame to determine evidence that an event is occurring or has occurred. When visual event detection used to detect objects without the need for a trigger signal, the analysis would determine evidence that an object is located in the field of view.
In an exemplary method the evidence is in the form of a value, called an object detection weight, that indicates a level of confidence that an object is located in the field of view. The value may be a simple yes/no choice that indicates high or low confidence, a number that indicates a range of levels of confidence, or any item of information that conveys evidence. One example of such a number is a so-called fuzzy logic value, further described below and in the Vision Detector Method and Apparatus. Note that no machine can make a perfect decision from an image, and so it will instead make judgments based on imperfect evidence.
When performing object detection, a test is made for each frame to decide whether the evidence is sufficient that an object is located in the field of view. If a simple yes/no value is used, the evidence may be considered sufficient if the value is “yes”. If a number is used, sufficiency may be determined by comparing the number to a threshold. Frames where the evidence is sufficient are called active frames. Note that what constitutes sufficient evidence is ultimately defined by a human user who configures the vision detector based on an understanding of the specific application at hand. The vision detector automatically applies that definition in making its decisions.
When performing object detection, each object passing through the field of view will produce multiple active frames due to the high frame rate of the vision detector. These frames may not be strictly consecutive, however, because as the object passes through the field of view there may be some viewing perspectives, or other conditions, for which the evidence that the object is located in the field of view is not sufficient. Therefore it is desirable that detection of an object begins when a active frame is found, but does not end until a number of consecutive inactive frames are found. This number can be chosen as appropriate by a user.
Once a set of active frames has been found that may correspond to an object passing through the field of view, it is desirable to perform a further analysis to determine whether an object has indeed been detected. This further analysis may consider some statistics of the active frames, including the number of active frames, the sum of the object detection weights, the average object detection weight, and the like.
The method of dynamic image analysis involves capturing and analyzing multiple frames to inspect an object, where “inspect” means to determine some information about the status of the object. In one example of this method, the status of an object includes whether or not the object satisfies inspection criteria chosen as appropriate by a user.
In some aspects of the Vision Detector Method and Apparatus dynamic image analysis is combined with visual event detection, so that the active frames chosen by the visual event detection method are the ones used by the dynamic image analysis method to inspect the object. In other aspects of the Vision Detector Method and Apparatus, the frames to be used by dynamic image analysis can be captured in response to a trigger signal.
Each such frame is analyzed to determine evidence that the object satisfies the inspection criteria. In one exemplary method, the evidence is in the form of a value, called an object pass score, that indicates a level of confidence that the object satisfies the inspection criteria. As with object detection weights, the value may be a simple yes/no choice that indicates high or low confidence, a number, such as a fuzzy logic value, that indicates a range of levels of confidence, or any item of information that conveys evidence.
The status of the object may be determined from statistics of the object pass scores, such as an average or percentile of the object pass scores. The status may also be determined by weighted statistics, such as a weighted average or weighted percentile, using the object detection weights. Weighted statistics effectively weight evidence more heavily from frames wherein the confidence is higher that the object is actually located in the field of view for that frame.
Evidence for object detection and inspection is obtained by examining a frame for information about one or more visible features of the object. A visible feature is a portion of the object wherein the amount, pattern, or other characteristic of emitted light conveys information about the presence, identity, or status of the object. Light can be emitted by any process or combination of processes, including but not limited to reflection, transmission, or refraction of a source external or internal to the object, or directly from a source internal to the object.
One aspect of the Vision Detector Method and Apparatus is a method for obtaining evidence, including object detection weights and object pass scores, by image analysis operations on one or more regions of interest in each frame for which the evidence is needed. In example of this method, the image analysis operation computes a measurement based on the pixel values in the region of interest, where the measurement is responsive to some appropriate characteristic of a visible feature of the object. The measurement is converted to a logic value by a threshold operation, and the logic values obtained from the regions of interest are combined to produce the evidence for the frame. The logic values can be binary or fuzzy logic values, with the thresholds and logical combination being binary or fuzzy as appropriate.
For visual event detection, evidence that an object is located in the field of view is effectively defined by the regions of interest, measurements, thresholds, logical combinations, and other parameters further described herein, which are collectively called the configuration of the vision detector and are chosen by a user as appropriate for a given application of the invention. Similarly, the configuration of the vision detector defines what constitutes sufficient evidence.
For dynamic image analysis, evidence that an object satisfies the inspection criteria is also effectively defined by the configuration of the vision detector.
Discussion of the Problem
Image analysis devices, including machine vision systems and vision detectors, must be configured to inspect objects. Typically, configuring such a device requires a human user to obtain at least one object whose appearance is representative of the objects to be inspected. The user captures an image of the object, generally called a training image, and uses it by choosing image analysis tools, positioning those tools on the training image, and setting operating parameters to achieve a desired effect. It is desirable that a training image be obtained under conditions as close to the actual production line as is practical. For production lines that operate in continuous motion, however, this can present difficulties for conventional vision systems. Generally a trigger signal must be used to obtain a useful training image, with the attendant limitations of triggers mentioned above. Furthermore, the single image captured in response to the trigger may not be the best viewing perspective from which to obtain the training image.
During configuration, a vision detector must be tested to determine whether its performance meets the requirements of the application. If the performance meets the requirements, the configuration can be considered complete, otherwise adjustments must be made. Since no vision detector makes perfect inspection decisions, testing performance generally includes some assessment of the probability that a correct decision will be made as to the status (e.g., pass or fail) of an object. Furthermore, since the error rates (incorrect decisions) are very low (desirably under one in 10,000 objects) this assessment can be very difficult. Therefore it is desirable that a large number of objects be tested, and that only those likely to represent incorrect decisions be assessed by the human user.