Some approaches for visual object recognition have been based on local image features and descriptors that allow fast determination of (or “extraction of”) and matching of image features while maintaining high discriminative properties. There are three main categories of local image features: corner points, regions (or “blobs”), and edges.
Corner points are image locations of high curvature, for example, where a corresponding image gradient exhibits a large change in both horizontal and vertical directions. One method of determining corner features in an image is known as “Harris Corners”.
Regions (or “blobs”) are another common type of local image feature. Regions may be detected in an image either via examining the properties of the Hessian matrix at each pixel of the image or by applying over-segmentation techniques. One method of detecting regions in an image is known as the Scale Invariant Feature Transform (SIFT). Another region based method of determining image features in an image is known as super-pixel segmentation.
Methods of determining image features in an image based on corners and regions are not descriptive enough for object detection and object class recognition, and are not amenable to learning of temporal shape models and parametric representations for straightforward manipulation under geometric transformation.
The third type of image features is referred to as the edge feature. Edges (straight lines) and contours or contour segments (curve lines) are defined as a collection of edgels that satisfy a 4 or 8-neighbour spatial connectivity constraint. An edgel is an image pixel with a gradient magnitude larger than a given threshold value. Edgels can be determined efficiently using convolution masks.
One edgel detection method uses an algorithm known as the “Canny detector” algorithm. The Canny detector algorithm produces a binary image as output. After edgels on an input image are determined, edges are determined by fitting lines between edgels. Connecting edgels to form lines is known as “Edge Linking”. A disadvantage of the Canny detector method is the sensitivity of the Canny detector algorithm to image noise and external factors. An example of an external factor is a camera's automatic gain control causing image brightness to continuously change, giving rise to different features lacking any invariance properties that would be necessary for developing a reliable object recognition system. Lack of robustness to image noise results in the Canny detector algorithm not being suitable for use in applications that require tracking.
One method of determining contours in an image is to first perform image segmentation or invariant region detection to determine the main structures in an image. A silhouette of each found region is then determined as a single contour or as a collection of contour fragments. Images may be first segmented into regions using an algorithm known as the Watershed algorithm, and then line segments are fitted to edges in the outline of the regions. Similarly, in another method of determining contours in an image Maximally-stable extremal regions (MSER) may be detected in the image and then a silhouette determined for each of the regions. Such methods of determining contours in an image are suitable for modelling objects of uniform colour or image regions of uniform intensity of limited spatial extend. However, such methods of determining contours in an image are not as suitable for objects that do not exhibit such characteristics, such as the objects often encountered in surveillance videos. Furthermore, the above described method of determining contours in an image is not suitable for object recognition by parts.
One method of determining image features in an image specifically tunes the Canny detector algorithm for object boundary detection. The tuned Canny detector algorithm method generates a probability at each pixel that the pixel belongs to an object boundary. Edgel detection is posed and solved as a supervised learning problem given a large collection of images segmented by human subjects. A small set of local image features including local brightness, texture, and colour, are evaluated for every pixel and a local neighbourhood of the pixel and statistically combined to determine the probability that each of the pixels belong to an object boundary. A threshold can be applied to this edgel probability image to generate a binary image. Edge linking can be used on the edgel probability image for contour segment determination similar to the Canny detection approach mentioned earlier. The tuned Canny detector algorithm method has also been extended to add a global constraint on boundary detection, making the tuned Canny detector algorithm method more robust to clutter and further improving performance. The main disadvantages of the tuned Canny detector algorithm method is the requirement for offline training with a large collection of human segmented data, and the large computational cost for detection, which may be several minutes for a high resolution image.
Another method of determining image features in an image is referred to as “edge drawing” (ED). It is called edge drawing because the method can be thought of as sketching the edges in an image by connecting two pixels that belong to the edge (i.e., like playing the children's game “connect the dots”). The main disadvantage of the edge drawing method is that dominant edges determined from a single image do not possess robustness properties with respect to photometric and geometric image transformations.