1. Field of the Invention
The invention generally relates to a method for recognizing objects in images. More particularly, the invention relates to a method by which the shape, size and position of an object may be accurately and swiftly determined.
2. Description of the Prior Art
The object recognition system in the prior art includes the following five steps: (1) Acquire an image by a still camera or a similar device (2) Segment the image (3) Extract objects from the segmented image (4) Abstract the extracted objects (5) Classify the abstract objects.
Captured input images are typically composed of color or gray-scale pixel data in digital format, arranged as two-dimensional matrices in the x (horizontal) and y (vertical) directions. Images may contain thousands or millions of individual pixels.
At the lowest level of abstraction (no abstraction at all), objects can be modeled as whole images and compared, pixel by pixel, against a raw input image stored in a pattern matching database. However, most object recognition systems use various methods for segmenting images and extracting, abstracting, and classifying image objects to improve recognition speed and/or accuracy (Krumm, U.S. Pat. No. 7,092,566).
In particular, color histograms have been used in various steps of the process, primarily to improve object recognition speed in color images.
In U.S. Pat. No. 7,020,329, Prempraneerach, et al., described of using color histograms to segment a color image into a plurality of regions by converting the image into a three-dimensional color space and then generating a color histogram for each dimension in the color space. Use the histograms to generate a plurality of connecting boxes in the three-dimensional color space and computing a normalized variance value for each connecting box to form clusters of connecting boxes. Map the segmented and clustered pixels back to the image domain to extract segmented regions from the image, and classifying clustered pixels in the image domain to recognize objects in the image, which correspond to areas of the image with consistent color characteristics. Recommended classifying methods include neural networks (with adaptive template matching), frequency-sensitive competitive learning, rival-penalized competitive learning and statistical classifiers.
Prempraneerach, et al., described the proposed clustering technique as more efficient than prior iterative clustering techniques. In addition, they report that the proposed segmentation method further reduces computation time by processing each pixel only once, to create the intensity, hue, and saturation histograms which are then used for clustering.
However, the segmentation method does not make use of prior information concerning known image color content and the color histograms which are created do not maintain geometry information. As a result, the method still requires a series of complex computations to complete the object recognition process. First, the input image is filtered with an edge preserving filter to smooth the color image and reduce discontinuities in the subsequently derived color histograms, then all of the pixels in the image are converted from RGB to LUV and LUV to IHS color space values using a set of eleven equations, IHS histograms are computed for all of the pixel data, the histograms are filtered to remove high frequency noise, and each histogram is searched for valleys to create connecting boxes (by convolving each histogram with a Gaussian kernel or Gaussian filter).
Clusters of connecting boxes are then formed by computing the normalized variance of pixel values within each connecting box and connecting boxes are linked into tree-like structures based upon normalized variance values. Connecting boxes which have local minimum normalized variance values serve as root nodes for each tree-like structure. The remaining connecting boxes are linked as branch nodes to root nodes using a steepest gradient descent algorithm.
For the method to work properly, homogeneous regions in the image must cluster into well defined intervals in the three histograms. However, in practice, actual image color variations may be difficult to distinguish from color variations that result from either noise or the nonlinear transformation from RGB to IHS color space.
In U.S. Pat. Nos. 7,092,566 and 6,952,496 and 6,611,622, Krumm describes using color histograms for both representing and classifying segmented regions in an input image. The described object recognition method first creates model histograms of objects to be recognized and then segments an input image to extract regions which likely correspond to the objects to be recognized, derives histograms from the segmented regions, and then compares the derived histograms with the stored model histograms. Similarity measures between input and model histograms that exceed a prescribed threshold indicate that an input object matches a model object. Matching input histograms may also be added to a database of model histograms for the given object.
The described method creates model and input image histograms by determining the actual RGB colors exhibited by the pixels in a model or input image region, dividing the overall range of actual pixel colors into a series of discrete color ranges or quantized color categories, assigning each pixel of the extracted model or input image region to the quantized color category into which the actual color of the pixel falls, and establishing a count of the number of pixels assigned to each quantized color category. In a preferred embodiment, RGB pixel values are quantized into 27 color categories. Input image and model histograms are compared by comparing the pixel counts from each quantized color category of the input image and model histograms. Model histograms must be derived from a prefatory image which is similar to input images from which objects are to be recognized. Regions of the image from which model histograms are derived are also used in subsequent input images for extracting objects to be recognized.
In a preferred embodiment, model images are segmented by analyzing a time sequence of images from the same imaging device, determining a static background image by identifying pixel values that do not change appreciably in the time sequence of images, producing a foreground image by subtracting the background image from a subsequent image, and segmenting the foreground image into object regions by identifying groups of smoothly varying pixel values.
However, the method is primarily useful for tracking known objects in a time sequence of images. In addition, the method still requires a significant amount of processing time for creating model histograms and comparing input image histograms to model histograms. The color histogram generation technique used does not preserve geometry, just an accounting of the number of image pixels of given colors. As a result, actual object shape, size, and location cannot be determined. Finally, similar histograms for different objects may lead to inaccurate object recognition results.
In U.S. Pat. Nos. 6,532,301 and 6,477,272, Krum, et al., describe using co-occurrence histograms to represent and identify the location of a modeled object in a search image.
The process starts by creating model images of the object and then computing a co-occurrence histogram (CH) for each of the model images. Model images are created by capturing sets of images of the object to be identified from viewpoints spaced at equal angles from each other around the object and at various distances. Co-occurrence histograms are computed by identifying every possible unique, non-ordered pair of pixels in the model image and generating counts of pairs of pixels which have colors that fall within the same color range and which are separated in distance by the same distance range.
Next, search windows, of prescribed size, are generated from overlapping portions of the search image, and a CH is computed for each of the search windows, using the technique and pixel color and distance ranges established for the model image co-occurrence histograms.
Finally, each model image CH is compared to each search window CH to assess their similarity. Every search window CH that matches a model image CH, as indicated by a similarity value which exceeds a threshold value, is designated as potentially containing the object to be recognized. The location of the recognized object is then determined to be within the single search area with the largest similarity measure, among all search areas designated as potentially containing the object to be recognized. The location of the recognized object can be further refined by iteratively moving the identified search area up, down, left, or right by one pixel location and then re-computing the search window CH and re-comparing the search window CH with each model CH, to find potentially higher similarity measures. The system and process requires that search window size, color ranges, and distance ranges be chosen before image searching begins.
Krum describes several advantages of the proposed method. In particular, Krum states that co-occurrence histograms are an effective way to represent objects for recognition in images. Keeping track of pairs of pixels, which have matching colors and a given distance between them, allows a variable amount of geometry information to be added to a regular color-only histogram. In turn, considering both color and geometry, allows the object recognition process to work in spite of confusing background clutter and moderate amounts of occlusion and object flexing.
However, creating a model image database for histogram matching takes a significant amount of time. Computing co-occurrence histograms by computing distance measures for every possible unique, non-ordered pair of pixels in both the model images and the search image is also computationally expensive.
In addition, the given abstract object representation contains no specific geometric information other than the distance between like-colored pixels in the model and search images. Information concerning actual object shape, size, and location is lost. The resulting object location determination is not precise, and subsequent iterative refinement of the determined location is computationally intensive.
Further, method parameters, such as search window size, affect object recognition accuracy, and images must be scaled to handle overall size differences between search and model images.
A useful and relatively well-defined application for object recognition systems is real-time traffic sign recognition from moving vehicles. In general, traffic sign systems must be capable of fast object extraction and accurate object classification.
In U.S. Pat. No. 6,801,638, Janssen, et al. describe a process and device for recognizing traffic signs and then displaying them as memory aids for an observer. Images are captured by an image sensor and analyzed and classified by classifiers implemented in an information processing unit. A synthetic image of a traffic sign is then generated, stored in a memory unit, and displayed by means of a display unit.
Input images are first searched, by color and/or spatial position, to determine areas which, with above average probability, could contain objects which are traffic signs. Objects are recognized within the determined areas by hierarchically and sequentially classifying the image areas by separate known characteristics of traffic signs, for example a correlation process to identify outer shape (circle or square) and inner symbols, with respect to stored characteristic data. The classifiers compare logical distance between input object characterizing data and typical characterizing data sets stored in a memory unit. Objects are recognized when comparison distances fall below set thresholds.
However, the classifiers, so designed, must be trained with several passes, to handle variations in image quality due to varying weather and light conditions. The described classifiers also depend upon correlations between input object shape data with shape data stored in a memory unit. In general, correlation-based classifiers can be imprecise and/or slow. Correlation depends upon training, consistency of the viewed environment, and/or quality of the stored shape data.
Improving stored shape data requires extensive training or a large stored database. In turn, a large stored database requires more processing time to complete correlations. As an example, circular and square objects appear as varying oval and rectangular shapes from different viewing angles, which could reduce object recognition accuracy.
The method described also depends upon searching the input image for areas which could contain traffic signs based upon color values and/or spatial position.
In U.S. Pat. No. 6,813,545, Stromme describes a system for reminding a driver of the presence of at least one particular traffic sign. The system consists of an imaging unit attached to the vehicle and directed toward the road ahead of the vehicle, a database which contains at least one pre-registered traffic sign shape, and an automatic recognition unit for detecting and identifying traffic signs, in successive images, by searching images areas which have a shape contained in the database, a selection process between two signs contained in the same image for determining the distance between the vehicle and the signs, and a sound and/or visual indicator which signals that an identified traffic sign is present on the road ahead of the vehicle.
Input images are captured periodically, based upon the speed of the vehicle. Each input image is analyzed in a shape recognition processor, to detect the presence of traffic sign shapes and traffic sign symbol shapes contained in the shape information database.
In a preferred embodiment, the shape search and recognition unit within the system uses conventional image processing methods: simple edge detection, such as Canny edge detection, followed by simple pixel-by-pixel matching across the processed image. Several views of the sign shapes are stored in the shape matching database. For triangles, circles, or rectangles, symbols contained within the sign are also identified, using pattern or recognition algorithms applied to the detected shape. Color detection is also carried out to check that a detected shape is effectively a traffic sign.
However, direct image searching methods and, in particular, edge detection and pixel-by-pixel pattern matching methods are generally imprecise and relatively slow. In addition, both sign and sign symbol shape and size vary significantly, based upon vehicle position with respect to a given traffic sign, which further hinders accurate shape matching and object recognition using edge detection and pixel-by-pixel matching.
In U.S. Pat. No. 5,926,564, Kimura uses histograms for object recognition. In his method, scanning is conducted along the x-axis and y-axis and “0-1 pattern” is used for comparison of images. The disadvantage of the method is the fact that the shape and size of an object can not be correctly calculated if there is a misalignment between an image capture device and the object.
From the above, we can see that the methods of the prior art have many disadvantages and need to be improved.
To eliminate the disadvantages of the methods of the prior art, the inventor has put in a lot of effort in the subject and has successfully come up with the method of the present invention.