The present invention relates to pattern and object recognition, and more particularly to a method for detecting an object in a digital image.
Over the years, there have been many methods developed to determine the image quality of an image-generating system such as a sensor/display combination. In most cases, the final consumer of the image produced is a human observer using their visual capability to extract visual information from the displayed image. In recent years, imaging systems and image manipulation have moved from the analog world to the digital world, which has probably added a bit more confusion to the issue of image quality or resolution.
In general, resolution is the ability of a sensor/display system to produce detail; the higher the resolution, the finer the detail that can be displayed. With the advent of digital imagery and sensor detectors that are composed of an array of discrete elements, it is tempting, and not entirely wrong, to characterize the resolution of the system by the number of picture elements (pixels) for the display or sensor elements in the case of the sensor. For example, VGA resolution for a computer display is 480 elements high by 640 elements wide and SVGA is 600×800 elements. This describes the number of samples that can be displayed; however, the number of pixels alone says nothing of the quality of the actual display medium characteristics (luminance, contrast capability, noise, color, refresh rate, active area to total area ratio, etc.) or of the signal/information used to feed the individual pixels. Nevertheless, this numerical value of pixel or sensor element count is often given as a primary metric to the resolution (quality) of the sensor or display.
Another common approach to determining the resolution of a sensor/display system is to image an appropriate resolution test target and determine the smallest sized critical test pattern element that can be seen by a human observer. Many test patterns have been developed over the years such as grating, tri-bars, tumbling Es, the Snellen chart, and the Landolt C chart to test vision or to test imaging systems using vision. The test pattern typically has test elements of various sizes so that the human observer can pick out the smallest size that they can resolve. An alternative to the multi-sized test pattern is to use a single size test element, but image it at various distances until a distance is obtained at which the test object is barely resolved.
Related to resolution is visual acuity, which is acuteness or clearness of vision that is dependent on the sharpness of the retinal focus within the eye and the sensitivity of the interpretative faculty of the brain. For example, numerous methods have been used to determine night vision goggle (“NVG”) visual acuity such as limiting resolution, Snellen Acuity, square wave targets, Landolt Cs, adaptive psychophysical, and directly measuring the psychometric function or the “frequency of seeing” curve. Each method produces a number that is composed of an actual acuity value plus error. There can be many sources of error but the largest is generally the method itself as well as the inherent variability of the observer while working under threshold conditions. Observer variability may be reduced through extensive training, testing the same time every day, and shortened sessions in order to reduce eye fatigue. Additionally, even though observers are given specific instructions, response criteria may also vary among or within observers; even over the course of a single experimental session. To assist in eliminating the criteria problem, a four alternative forced-choice paradigm was developed and utilized to measure the entire psychometric function. This paradigm allowed for any desired response criteria level (e.g., 50% or 75% corrected for chance, probability of detection) to be selected for the prediction of (NVG) visual acuity performance. Although all of the preceding was directed at visual acuity/resolution assessment of night vision goggles using multiple human observers the “resolution” concept applies equally well to digital imagery.
Current and future military weapons systems (e.g. micro UAVs, satellites, surveillance, weapons aiming optics, day/night head-mounted devices) will increasingly rely on digitally-based multi-spectral imaging capabilities. With digital media comes the potential to register, fuse, and enhance digital images whether they are individual images or streaming video gathered in real-time. Multi-spectral fusion and enhancement provides the greatly increased potential to detect, track, and identify difficult targets, such as those that are camouflaged, buried, hidden behind a smoke screen or obscured by atmospheric effects (haze, rain, fog, snow).
There are several different conventional techniques to assess the relative improvement in image quality when an image-enhancing algorithm has been applied to a digital image. The testing of enhancing effects often consists of subjective quality assessments or measures of the ability of an automatic target detection program to find a target before and after an image has been enhanced. It is rare to find studies that focus on the human ability to detect a target in an enhanced image using scenarios that are relevant for the particular application for which the enhancement is intended. While a particular algorithm may make an image appear substantially better after enhancement, there is no indication as to whether this improvement is significant enough to improve human visual performance.
Therefore, there is a need in the art to automatically assess image quality in terms of modeled human visual resolution perceptual qualities (i.e., the “frequency of seeing” curve) but without the need to actually use human observers.