1. Technical Field
The invention is related to a computer-implemented object recognition system and process, and more particularly, to such a system and process employing co-occurrence histograms (CH) for finding an object in a search image.
2. Background Art
Object recognition in images is typically based on a model of the object at some level of abstraction. This model is matched to an input image which has been abstracted to the same level as the model. At the lowest level of abstraction (no abstraction at all), an object can be modeled as a whole image and compared, pixel by pixel, against a raw input image. However, more often unimportant details are abstracted away, such as by using sub-templates (ignoring background and image position), normalized correlation (ignoring illumination brightness), or edge features (ignoring low spatial frequencies). The abstraction itself is embodied in both the representation of the object and in the way it is matched to the abstracted image. For instance, Huttenlocher et al. [1] represent objects as simple edge points and then match with the Hausdorff distance. While the edge points form a completely rigid representation, the matching allows the points to move nonrigidly.
One interesting dimension of the aforementioned abstraction is rigidity. Near one end of this dimension are the several object recognition algorithms that abstract objects into a rigid or semi-rigid geometric juxtaposition of image features. These include Hausdorff distance [1], geometric hashing [2], active blobs [3], and eigenimages [4, 5]. In contrast, some histogram-based approaches abstract away (nearly) all geometric relationships between pixels. In pure histogram matching, e.g. Swain and Ballard [6], there is no preservation of geometry, just an accounting of the number of pixels of given colors. The technique of Funt and Finlayson [7] uses a histogram of the ratios of neighboring pixels, which introduces a slight amount of geometry into the representation.
Abstracting away rigidity is attractive, because it allows the algorithm to work on non-rigid objects and because it reduces the number of model images necessary to account for appearance changes due to scaling and viewpoint change. One can start with a geometrically rigid approach and abstract away some rigidity by using geometric invariants [8], loosening the matching criteria [1], or explicitly introducing flexibility into the model [3]. On the other hand, one can start with a method like Swain and Ballard""s color indexing [6], which ignores all geometry, and add some geometric constraints. For example, some histogram-based approaches, most of which are used to find images in a database rather than to find an object in an image, have employed attempts to add spatial information to a regular color histogram. Included among these are Huang et al. [9] where a xe2x80x9ccolor correlogramxe2x80x9d is used to search a database for similar images, or Pass and Zabih [10] where xe2x80x9ccolor coherence vectorsxe2x80x9d are employed that represent which image colors are part of relatively large regions of similar color.
The dilemma comes in deciding how much to abstract away. The goal is to ignore just enough details of the object""s appearance to match all anticipated images of the object, but not so many details that the algorithm generates false matches. The present invention addresses this issue.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [4, 5]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
This invention is directed toward an object recognition system and process that identifies the location of a modeled object in an image. Essentially, this involves first creating model images of the object whose location is to be identified in the search image. As the object may be depicted in the search image in any orientation, it is preferred that a number of model images be captured, of which each shows the object from a different viewpoint. Ideally, these model images would be taken from viewpoints spaced at roughly equal angles from each other around the object. In addition, multiple images could be taken at each angular viewpoint where each is captured at a different distance away from the object being modeled. This latter method would better model an object whose distance from the camera capturing the search image is unknown. One way of determining how far apart each angular viewpoint should be to ensure a good degree of match between the object in the search image and one of the model images is to require a high degree of matching between each model image. Similarly, it is desirable that a good degree of match exist between adjacent model images taken at the same angular viewpoint, but at different distances from the object, for the same reasons.
Each model image is processed to compute a color co-occurrence histogram (CH). This unique histogram keeps track of the number of pairs of certain xe2x80x9ccoloredxe2x80x9d pixels that occur at particular separation distances in the model image. In this way, geometric information is added to the histogram to make it more selective of the object in the search image, than would a simple color histogram. In generating the CH there are two parameters that are selected ahead of time. These parameters are the color ranges and the distance ranges. Choosing the color ranges involves dividing the total possible color values associated with the pixels of the model images into a series of preferably equal sized color ranges. Choosing the distance ranges involves selecting a set of distance ranges, for example (1-2 pixels), (2-3 pixels), (3-4 pixels) . . . , up to a prescribed maximum separation distance between pairs of pixels that will be checked. The size of each color and distance range (and so ultimately the total number of different ranges) is preferably selected to optimize the search process as will be discussed later. Essentially, these ranges should be made small enough to ensure a high selectivity in finding the object in the search image and to minimize false matches. On the other hand, the ranges should be large enough to reduce the amount of processing and to allow enough flexibility in the match that small changes in the shape of the object in the search image (e.g. due to flexure, a slight change in viewpoint, or a different zoom level or scale) do not prevent an object from being recognized.
Once the color and distance ranges are established, each pixel in the model image is quantized in regards to its color by associating it to the color range which includes the actual color value of the pixel. The model images are quantized in regards to distance by associating each possible unique, non-ordered, pair of pixels in a model image to one of the distance ranges based on the actual distance separating the pixel pair. The CH is then generated by establishing a count of the number of pixel pairs in a model image which exhibit the same xe2x80x9cmixxe2x80x9d of color ranges and the same distance range.
The search image (i.e., the image to be searched for the modeled object) is first cordoned into a series of preferably equal sized sub-images or search windows. These search windows preferably overlap both side-to-side and up-and-down. In the tested embodiment of the present invention, the overlap was set to one-half the width and height of the search window. The size of the search window is preferably as large as possible so as to minimize the search process. However, it is also desired that the search window not be so large that false matches or alarms occur. It should be noted that the size of the search window, as well as the size of the aforementioned color and distance ranges, can be optimized via a unique false alarm analysis. This analysis will be discussed in greater detail below.
A CH is calculated for each search window in the same manner that they were generated in connection with the model images. In particular, the same color and distance ranges are employed in the quantization steps of the process. Each of these search window CHs is compared to each of the model image CHs. Essentially, this comparison assesses the similarity between the histograms. The results of this comparison can be handled in two ways. In a first version, a similarity value threshold is established and any comparison that exceeds the threshold would cause the associated search window to be declared as potentially containing the object. This version would allow identifying each location of the object in a search image which depicts more than one of the objects. In a second version, the threshold would still be used, but only the greatest of the similarity values exceeding the threshold would be chosen to indicate a search window potentially contains the object. This version would be especially useful where it is known only one of the objects exists in the image. It could also be useful in that the threshold could be made lower and the object still located, for example in a noisy image.
It is noted the preferred approach for assessing similarity between a search window CH and a model image CH is via an intersection analysis. In the context of the CHs, an intersection analysis essentially entails comparing the count in each corresponding color mix/distance category between a search window CH and a model image CH, and then identifying the smaller count. The identified smaller counts from each color mix/distance category are added together to produce a similarity value. It is noted that two matching images of an object will have a larger similarity value than non-matching images because the smallest count from each category will be nearly as large as the larger count, whereas the smaller count in non-matching images are likely to be significantly smaller than the larger value. Thus, the sum of the smaller counts from matching images should be larger than the sum of counts from nonmatching images.
Once the search window (or windows) potentially containing the object has been identified, the location can be fined tuned. This is accomplished by repeating the comparison process using the particular model image CH involved in the match and search window CHs generated from search windows respectively created by moving the identified search window up by one pixel row, down by one pixel row, to the left by one pixel column, and to the right by one pixel column, respectively. The search window associated with the largest similarity value of the five search windows (i.e., the original, up, down, right, and left) is selected. This refinement process is repeated until a newly selected window is associated with the maximum similarity value, unless the original search window already exhibits the maximum similarity value. This search window selected at the end of the refinement process is declared as containing the object.
The foregoing object recognition process employing color co-occurence histograms required that some of the search parameters be chosen ahead of time. Specifically, the size of the search window, color ranges and distance ranges must be selected. With enough trial-and-error experimentation, it is possible to discover reasonable values for these parameters. Such experimentation is not preferred because it can be tedious, and gives only a vague sense of the sensitivity of the search process to changes in these parameters. However, the choice of the parameters can be optimized via the unique false alarm analysis of the present invention.
The analysis employs a numerical algorithm that approximates a mathematical model for estimating the false alarm probability of the object recognition process. In the context of the CH object recognition system, the false alarm probability is the probability that the intersection of a search window CH and a model image CH (Ipm) will exceed the search threshold (xcex1Imm) when the object is not actually in the search window. Intuitively, the false alarm rate would be expected to increase as the search window grows in size, because there will be more opportunities for accidentally accounting for all the entries in the model CH. Also, the false alarm probability would be expected to decrease with an increase in the number of colors (nc) and the number of distances (nd), because these increases lead to more specific models that are less likely to be accidentally accounted for by a random background. Pulling these parameters in the opposite direction is the desire for a speedy algorithm and for one that does not required too many model viewpoints of each object. The algorithm would run faster if the search window were larger, because the whole image could then be searched faster. The intersection computation would be faster with fewer colors and distances. Fewer distances would also tend to generalize a model of a particular viewpoint to match those of more disparate viewpoints, and it would match better with objects that flex. The false alarm probability helps arbitrate between these desires, making it possible to set the parameters for the fastest possible execution and most tolerant matching without undue risk of false alarm.
The process of computing the false alarm probability begins by computing the probability of occurrence of a given CH on a random background region. To this end, the number nk of ways that a distance interval can occur in the search window must be ascertained. For instance, a distance interval of 5 pixels can only occur in a limited number of ways in a given size image. This number can be computed using a simple Matlab program. For a fixed distance interval there are nc(nc+1)/2 possible unique, nonordered color pairs with corresponding bins in the CH, each pair occurring with probability pij. Each bin contains nijk counts. The probability of a partition of the nk color pairs into nc(nc+1)/2 bins with nijk in each bin is given by the multinomial distribution:       f    ⁡          (                        n          11          k                ,                  n          12          k                ,        …        ⁢                  xe2x80x83                ,                  n          ij          k                ,        …        ⁢                  xe2x80x83                ,                  n                      nc            ,            nc                    k                    )        =            (                                                  n              k                                                                                          n                11                k                            ,                              n                12                k                            ,              …              ⁢                              xe2x80x83                            ,                              n                ij                k                            ,              …              ⁢                              xe2x80x83                            ,                              n                                  nc                  ,                  nc                                k                                                        )        *          P      11              n        11        k              ⁢          P      12              n        12        k              ⁢    …    ⁢          xe2x80x83        ⁢          P      ij              n        ij        k              ⁢    …    ⁢          xe2x80x83        ⁢          P      ncnc              n                  nc          ,          nc                k            
where       (                                        n            k                                                                          n              11              k                        ,                          n              12              k                        ,            …            ⁢                          xe2x80x83                        ,                          n              ij              k                        ,            …            ⁢                          xe2x80x83                        ,                          n                              nc                ,                nc                            k                                            )    =                    n        k            !                                n          11          k                !            ⁢                        n          12          k                !            ⁢      …      ⁢              xe2x80x83            ⁢                        n          ij          k                !            ⁢      …      ⁢              xe2x80x83            ⁢                        n                      nc            ,            nc                    k                !            
The function ƒ(n11k,n12k, . . . ,nijk, . . . ,nnc,nck) will be abbreviated as ƒ(nc,nk) for convenience. This function ƒ(nc,nk) is the probability of observing a given set of co-occurrences in a distance range (kxe2x88x921,k). As it is assumed (albeit not totally true) that these probabilities are independent with respect to k, the probability of seeing a particular CH in the search window would be       P    ⁢          (      CH      )        =            ∏              k        =        1                    n        d              ⁢          xe2x80x83        ⁢          f      ⁢              (                              n            c                    ,                      n            k                          )            
If the intersection of the model CH and the image region CH exceeds a prescribed threshold without the object being there, then this is a false alarm. Ideally, to compute the probability of a false alarm, a list of all the CHs whose intersection with the model CH exceeds the threshold would be made. The probability of each CH in the list is computed as described above and summed to get the false alarm probability. However, the list is much too long. Instead the aforementioned multinomial distribution is approximated as a multivariate Gaussian distribution and integrated. This simplifies the problem from summing values in an enormous list to integrating a multidimensional Gaussian. Specifically, ƒ(nc,nk) can be approximated by the ordinate of a multidimensional Gaussian:             f      ⁢              (                              n            c                    ,                      n            k                          )              ≈                  g        k            ⁢              (        n        )              =            1                                    (                          2              ⁢              π                        )                                m            /            2                          ⁢                              "LeftBracketingBar"            Σ            "RightBracketingBar"                                1            /            2                                ⁢          exp      ⁡              [                              -                          1              2                                ⁢                                    (                              n                -                μ                            )                        T                    ⁢                                    Σ                              -                1                                      ⁡                          (                              n                -                μ                            )                                      ]            
where
M=nc(nc+1)/2xe2x88x921
n=(n11k,n12k, . . . ,nijk, . . . ,nncxe2x88x921,nck)T ixe2x89xa6j
xcexc=nk(p11,p12, . . . ,pij, . . . ,pncxe2x88x921,nc)T ixe2x89xa6j
xe2x80x83and where the inverse covariance is       Σ          -      1        =            {              a        rs            }        =          {                                                                                                        1                                          n                      k                                                        ⁢                                      (                                                                  1                                                  q                          r                                                                    +                                              1                                                  q                          *                                                                                      )                                                                                                                                          if                                        ⁢                                          xe2x80x83                                        ⁢                    r                                    =                  s                                                                                                      1                                                            n                      k                                        ⁢                                          q                      *                                                                                                                                                              if                                        ⁢                                          xe2x80x83                                        ⁢                    r                                    ≠                  s                                                              ⁢                      xe2x80x83                    ⁢                      (                          r              ,              s                        )                          ∈                  [                      1            ,            2            ,            …            ⁢                          xe2x80x83                        ,            m                    ]                    
The Gaussian gk(n) can be integrated to give probabilities of sets of CHs occurring in the image background. The integration limits are given in terms of the number of co-occurrences of particular color pairs. However, the Guassian only applies to a single specified distance range (kxe2x88x921,k). Thus, it is still necessary to list all the CHs that could cause a false alarm. These would ideally be represented in terms of integration limits for the Guassian approximation, but this list is too complex to characterize in this way. However, by simplifying the definition of a false alarm from xe2x80x9cany background CH whose intersection with the model exceeds the threshold T=xcex1Imm xe2x80x9ctoxe2x80x9d any background CH, each of whose entries nijk exceeds a threshold Tijk=xcex1mijkxe2x80x9d, integrating the Gaussian gives the probability of false alarm for all color pairs at one distance range (kxe2x88x921,k). Since it can be is presumed from P(CH)       P    ⁢          (      CH      )        =            ∏              k        =        1                    n        d              ⁢          xe2x80x83        ⁢          f      ⁢              (                              n            c                    ,                      n            k                          )            
ƒ(nc,nk) that the co-occurrences at different distances are independent, the probability of a false alarm can be approximated as
      P    ⁢          (        ⁢          false alarm        ⁢          )        ≈            ∏              k        =        1                    n        d              ⁢          xe2x80x83        ⁢                  ∫                  Ω          k                    ⁢                        g          k                ⁡                  (          n          )                    xe2x80x83xe2x80x83∫xcexa9kgk(n)
where xcexa9k is the integration region for CH distance range (kxe2x88x921,k). The integration region xcexa9k is unbounded above and bounded below as n11kxe2x89xa7xcex1m11k,n12kxe2x89xa7xcex1m12k, . . . ,nijkxe2x89xa7xcex1nijk, . . . ,nncxe2x88x921,nckxe2x89xa7xcex1mncxe2x88x921,nck. The actual integration is preferably performed using a Monte Carlo technique.
The foregoing approximation of the probability of a false alarm occurring can then be used to plot probability curves for one of the parameters (i.e., either the colors ranges, the distance ranges, or the search window size) at different values thereof, while holding the other parameters constant. The goal of this procedure is to find the optimum value. In the case of the color and distance ranges this optimum will occur at a point where the ranges are as large as possible while at the same time exhibiting an acceptable probability of false alarms. In the case of the search window size, the optimum is the largest window size possible with the same acceptable probability of false alarms. It is noted that the plotting process can be repeated for various combinations of constant parameter values to find the curve that gives the largest color or distance range, or the largest window size, at an acceptably low false alarm probability. Preferably, the new values of the constant parameters would be chosen by performing an identical probability analysis using one of the constant parameters as the varying parameter and holding the others constant. Once initial values for each parameter have been computed the process is repeated iteratively until a set of mutually optimal values is established. It is also noted that the results are specific to the particular model image employed in the calculation, and so could be repeated for each model image. However, it was found in a tested embodiment that the computed optimal parameter values from one model image worked reasonably well for all the other models as well.
In view of the foregoing description, it is evident that the color CH is an effective way to represent objects for recognition in images. By keeping track of pairs of pixels, it allows a variable amount of geometry to be added to the regular color histogram. This in turn allows the object recognition process to work in spite of confusing background clutter and moderate amounts of occlusion and object flexing. Specifically, by adjusting the distances over which the co-occurrences are checked, the sensitivity of the process can be adjusted to account for geometric changes in the object""s appearance such as caused by viewpoint change or object flexing. The CH is also robust to partial occlusions, because it is not required that the image account for all the co-occurrences of the model.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.