The ultimate goal of computer vision is image understanding, in other words, knowing what is within an image at every coordinate. A complete computer vision system should be able to segment an image into homogeneous portions, extract regions from the segments that are single objects, and finally output a response as to the locations of these objects and what they are.
Frameworks for image understanding consists of three, not necessarily, separate processes. Consider a representative computer vision system as shown in FIG. 1A. In the first process 11, image segmentation is performed, this consists of dividing the image into homogeneous portions that are similar based on a correlation criterion. Much of the work in computer vision has focused in this area with topics including edge detection, region growing (clustering), and thresholding as the primary methods. Image segmentation is referred to as the low-level vision (LLV) process. In the second process 12, region extraction is performed. Region extraction receives as input, the results obtained during the LLV stage. With this information region extraction, or the intermediate-level vision (ILV) process, attempts to represent the segments as single, hypothesized objects. This requires the ILV process to search for evidence on the desired region using the LLV process' output. Consequently, the third process 13 performs image understanding based on the extracted regions provided as input. The hypothesized-image understanding operation is referenced as the high-level vision (HLV) process.
Most computer vision research over the past 30 years has focused on LLV processes. Efforts to further the knowledge of ILV processes have primarily utilized LLV methods. Therefore, it remains an objective of computer vision systems to locate low-level image regions whose features best support alternative image hypotheses, developed by a high-level vision process, and provide the levels- and indicators-of-match.
There have been many attempts to solve the ILV problem utilizing a Hough Transform methodology. The Hough Transform is a particularly desirable technique for use in vision systems when the patterns in the image are sparsely digitized, for instance having gaps in the patterns or containing extraneous “noise.” Such gaps and noise are common in the data provided by LLV processes such as edge detection utilized on digitally captured images. The Hough Transform was originally described in U.S. Pat. No. 3,069,654.
In an influential paper by D. H. Ballard the Hough Transform was generalized for arbitrary shapes, the technique was coined Generalized Hough Transform (GHT), Generalizing the Hough Transform to Detect Arbitrary Shapes (1981). The generalized Hough Transform is a method for locating instances of a known pattern in an image. The search pattern is parameterized as a set of vectors from feature points in the pattern to a fixed reference point. This set of vectors is the R-table. The feature points are usually edge features and the reference point is often at or near the centroid of the search pattern. The, typically Cartesian, image space is mapped into parameter or Hough space. To locate the pattern in an image, the set of feature points in the image is considered. Each image feature is considered to be each of the pattern features in turn and the corresponding locations of the reference point are calculated. An accumulator array keeps track of the frequency with which each possible reference point location is encountered. After all the image features have been processed the accumulator array will contain high values (peaks) for locations where many image features coincided with many pattern features. High peaks (relative to the number of features in the pattern) correspond to reference point locations where instances of the pattern occur in the image. The Hough Transform can be enhanced by considering rotated and shortened or lengthened versions of the vectors to locate instances of the pattern at different orientations and scales. In this case, a four dimensional accumulator array is required and the computation is increased by two orders of magnitude. The key contribution of the GHT is the use of gradient vector data to reduce the computation complexity of detecting arbitrary shapes. Unfortunately, the method's time and space complexity becomes very high by requiring the entire search of a four-dimensional Hough parameter space. For rotation- and scale-invariance, the GHT method requires a priori knowledge of the possible rotations and scales that may be encountered. More recent procedures that provide either or both of rotation and scale invariance using the Hough Transform include:
The work of Jeng and Tsai, Fast Generalized Hough Transform (1990), which proposes a new approach to the GHT where transformations are applied to the template in order to obtain rotation- and scale-invariance. The R-Table is defined as in the original GHT technique. Scale-invariance is provided by incrementing all the array positions of a Hough parameter space using another table called the SI-PSF. For rotation-invariance each position in the SI-PSF with a non-zero value generates a circle with its center at the reference point; a radius equal to the distance between the reference point and this position of the SI-PSF is calculated. Subsequently, these circles are correspondingly superimposed onto each image point in order. Obviously, each image point requires a high number of increments; the computational complexity of this method is very high if the template and the image shapes have a large number of points.
The disclosure of Thomas followed trying to compress the Hough parameter space by one degree of freedom to obtain the location of arbitrary shapes at any rotation, Compressing the Parameter Space of the Generalized Hough Transform (1992). This method considers a set of displacement vectors, {r}, such that each edge pixel with identical gradient angles increments positions in one plane of the parameter space. Thus, the original four-dimensional Hough parameter space of the GHT reduces to 3-dimensions. As a result, the technique is not scale-invariant and requires the same processing complexity as performed in the GHT.
Pao, et al., Shape Recognition Using the Straight Line Hough Transform (1992), described a technique derived from the straight-line Hough Transform. A displacement invariant signature, called the STIRS, is obtained by subtracting points in the same column of the STIRS space. Subsequently, template and image signatures are compared using a correlation operator to find rotations. Unfortunately, scale-invariance is not provided for since it must be known a priori. Experiments show that the STIRS does not work well when different shapes appear in the image.
A new version to the GHT called the Linear GHT (LIGHT) was developed by Yao and Tong, Linear Generalized Hough Transform and its Parallelization (1993). A linear numeric pattern was devised, denoted the vertical pattern, which constitutes the length of the object along the direction of a reference axis (usually the y-axis). The authors state that rotation- and scale-invariance is handled, using this new method, in much the same way it is performed by the GHT. Clearly, the same deficiencies exist for this method as it requires a large Hough parameter space and the a priori knowledge of the expected rotations and scales.
The effort of Ser and Sui, A New Generalized Hough Transform for the Detection of Irregular Objects (1995) describes an approach that merges the advantages of the Hough Transform and that of a technique called contour sequencing. The calculation of the contour sequence requires that an entire object's perimeter be available and not occluded. Thus, if a portion of the desired object is occluded, for instance—a noisy image, this method will fail.
Aguado, et al., Arbitrary Shape Hough Transform by Invariant Geometric Features (1997) approached the problem of region extraction by using the Hough Transform under general transformations. Even though this method provides for rotation- and scale-invariance, it comes at a complexity cost for derivations of shape-specific general transformations, also required for translation-invariance as well.
The most recent work by Guil, et. al. presents an algorithm based on the GHT which calculates the rotation, scale, and translation of an object with respect to a template, A Fast Hough Transform for Segment Detection (1995). The methodology consists of a three stage detection process and the creation of five new tables. Three of the tables are constructed for the template, the remaining two are used against the image. The first stage of the detection process obtains the rotation, the next gathers the scale, and finally the translation is found in the third
The complexity of this method is clearly high as the image and template are repeatedly tested using different tables to obtain the invariant values. Furthermore, the results of a previous stage are used to obtain the answer to the next stage, hence, if a previous stage fails the next one will also. The use of gradient angles is appropriate, however, dividing the original R-Table into five tables to obtain the desired invariance's has added unnecessary complexity to the problem.