Local image descriptors are descriptors of a portion of an image (e.g., the image portion is the surrounding vicinity of an interest point). Local image descriptors are employed for object recognition (i.e., recognizing an object within the image). A local image descriptor which contains information respective of the object as well as information respective of the background of the object (e.g., the described image portion is partly of the object and partly of the background) is not well suited for object recognition.
Reference is now made to FIG. 1, which is a schematic illustration of an image, generally referenced 50, of a cellular phone and a local polar-rays descriptor for detecting local structures within the image, as known in the art. Image 50 includes a cellular phone 52 and local polar-rays descriptor 62. Cellular phone 52 includes a keypad portion 54, a screen portion 56, a screen 58, and a plurality of keys 60A, 60B, 60C, 60D, 60E, 60F, 60G, 60H, 601, 60J, 60K, and 60L. Local polar-ray descriptor 62 includes an origin point 64 and a plurality of polar rays 66A, 66B, 66C, and 66D.
Local polar-rays descriptor 62 is in the form of a plurality of rays originating from a single origin point 64 (e.g., a corner interest point). Each of the rays of local polar-rays descriptor 62 ends upon reaching an image area (not shown), having an intensity different than the intensity of origin point 64. Each of the rays of local polar-rays descriptor 62 can go through image edges of image 50, as long as the intensity on both sides of the image edge is similar. Polar ray 66A starts at origin point 64 and stops at the right edge of cellular phone 52 (i.e., the right edge of screen portion 56). Polar ray 66B starts at origin point 64 and stops at the right edge of screen 58. Polar ray 66C starts at origin point 64, goes through the edge separating screen portion 56 and keypad portion 54, and stops at key 60B. Polar ray 66D starts at origin point 64, goes through the edge separating screen portion 56 and keypad portion 54, and stops at key 60C.
An article by Lech Szumilas et al., entitled “Local Structure Detection with Orientation-Invariant Radial Configuration”, is directed to a method for generating a local image Orientation-invariant Radial Configuration (ORC) descriptor. The method includes the steps of detecting interest points at local symmetry extrema, extracting the luminance profiles along N radii, yielding one or more boundary-points in each radii, and constructing multiple boundary-points configurations.
The detection of interest points is associated with local symmetry extrema such that the interest points tend to appear in the salient image structure in the image. The second step is extracting the luminance profiles along N radii, positioned at equal radial intervals of 2π/N. The luminance profiles can be replaced or augmented with other profiles, such as color profiles.
The next step is yielding one or more boundary-points along each radius. The boundary-points correspond to transitions between relatively different regions of pixels. Boundary-points detection is repeated for each radius separately. Boundary point detection is different from edge detection by considering only a local image patch and therefore, estimating edge strength relative to local conditions instead of global conditions.
Boundary-points grouping (i.e., boundary point configurations) is based on grouping boundary-points in adjacent radii, exhibiting similar boundary luminance transitions and similar inner luminance spread. An inner luminance spread is the same as the luminance standard deviation along the radius between interest point and boundary point.
An article by Tinne Tuytelaars et al., entitled “Matching Widely Separated Views Based on Affine Invariant Regions”, is directed to a method for extracting invariant regions in an image for matching image patches of two wide-baseline stereo images. The method includes the steps of locating a local extrema of intensity, determining an intensity function along each of a plurality of rays emanating from the local extrema, determining a maximum point for each of these intensity functions, linking all the maximum points for enclosing an image region, fitting an ellipse to the enclosed region, and doubling the size of the ellipse.
The energy function is given by the following formula:
            f      l        ⁡          (      t      )        =            abs      ⁡              (                              l            ⁡                          (              t              )                                -                      l            0                          )                    max      ⁡              (                                                            ∫                0                t                            ⁢                                                abs                  ⁡                                      (                                                                  l                        ⁡                                                  (                          t                          )                                                                    -                                              l                        0                                                              )                                                  ⁢                                  ⅆ                  t                                                      t                    ,          d                )            
t—the Euclidean arc length along a ray
l0—the local extrema
l(t)—the intensity at position t
The enclosing ellipse has similar shape moments (i.e., up to the second order) as those of the enclosed region. The double sized ellipse defines a more distinctive image region due to a more diversified texture pattern within the area of the ellipse.
An article by Serge Belongie et al., entitled “Shape Matching and Object Recognition Using Shape Contexts”, is directed to a method for matching objects according to their shape. This article describes a new shape context descriptor. An object is presented as a discrete set of points sampled from the internal or external contours of the object. The contours of the object are detected by an edge detector.
The shape context descriptor is constructed by defining a set of vectors originating from a single point of the discrete set (i.e., a reference point) to all other points of the discrete set. In order to make the shape context descriptor more robust, the following steps are executed. Computing a histogram of the relative coordinates (i.e., based on the set of vectors) of the remaining points of the discrete set. Defining a plurality of sets of bins spread uniformly in log-polar space. Making the descriptor more sensitive to positions of nearby sample points (i.e., points of the discrete set) according to the bin each of the points is located at. In other words, points of the discrete set located in a bin closer to the reference point are given a higher value.