Pattern and object recognition systems that are capable of performing an automated scene analysis and/or object identification can be used for a variety of tasks.
In order to recognize objects in an image, it is necessary to first separate those parts of the image which belong to objects (foreground) from those parts of the image which do not (background). This process is usually referred to as “image segmentation”. Image segmentation is typically performed for object recognition in a digital image, since objects should be recognized irrespective of their background. Algorithms that are capable of performing this step are called “segmentation algorithms”.
Most standard algorithms exploit the fact that objects are separated from the background by a more or less well defined border. They perform the segmentation step by first decomposing the image into small “elementary features”, like oriented line segments, which are then used to successively construct larger objects. Segmented objects are, thus, described in terms of said elementary features.
This approach has several problems. One problem is the choice of a suitable method for extracting object borders from the intensity values of the digital image. This problem becomes worse if the intensity changes between objects and background are small, or if the intensity variations within an object are comparable to those between object and background. In order to overcome this problem, a number of image enhancement techniques are used which seek to improve the visual appearance of an image in such a way that the contrast between objects and background are amplified. Another common problem is the choice of a suitable method for compiling an object from the set of elementary features. This problem becomes even worse if the image contains more than one object, or if the object is surrounded by a possibly large number of distracting objects (clutter).
Important issues related to image segmentation are choosing good segmentation algorithms, measuring their performance, and understanding their impact on the scene analysis system.
According to the state of the art, there are different solutions to the problem of object segmentation and recognition. In order to understand the main idea of the underlying invention, it is necessary to briefly describe some of their basic features.
1. Histogram Thresholding
In “Analysis of Natural Scenes” (PhD Thesis, Carnegie Institute of Technology, Dept. of Computer Science, Carnegie-Mellon University, Pittsburgh, Pa., 1975) by R. B. Ohlander, a thresholding technique that can advantageously be applied to segmenting outdoor color images is proposed. It is based on constructing color and hue histograms. The picture is thresholded at its most clearly separated peak. The process iterates for each segmented part of the image until no separate peaks are found in any of the histograms.
In their article “Gray-Level Image Thresholding Based on a Fisher Linear Projection of a Two-Dimensional Histogram” (Pattern Recognition, vol. 30, No. 5, pp. 743-749, 1997, incorporated herein by reference), the authors L. Li, J. Gong and W. Chen propose that the use of two-dimensional histograms of an image is more useful for finding thresholds for segmentation rather than just using gray-level information in one dimension. In 2D-histograms, the information on point pixels as well as the local gray level average of their neighborhood is used.
2. Edge-Based Segmentation
In the article “Neighbor Gray Levels as Features in Pixel Classification” (Pattern Recognition, vol. 12, pp. 251-260, 1980, incorporated herein by reference) by N. Ahuja, A. Rosenfeld and R. M. Haralick, it is described how pixel-neighborhood elements can be used for image segmentation.
In the article “Extracting and Labeling Boundary Segments in Natural Scenes” (IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, No. 1, pp. 16-27, 1980, incorporated herein by reference) by J. M. Prager, a set of algorithms used to perform a segmentation of natural scenes via boundary analysis is disclosed. The aim of these algorithms is to locate the boundaries of an object correctly in a scene.
In “Area Segmentation of Images using Edge Points” (IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 2, No. 1, pp. 8-15, 1980, incorporated herein by reference) by W. A. Perkins, an edge-based technique for an image segmentation is employed. Therein, it is shown that edge-based segmentation has not been very successful due to small gaps that allow merging dissimilar regions.
A different adaptive thresholding algorithm for image segmentation using variational theory is proposed in the article “Adaptive Thresholding by Variational Method” (IEEE Transactions on Image Processing, vol. 2, No. 3, pp. 168-174, 1998, incorporated herein by reference) by F. H. Y. Chan, F. K. Lam and H. Zhu.
A further approach for an image processing based on edge detection can be found in the article “Local Orientation Coding and Neural Network Classifiers with an Application to Real-Time Car Detection and Tracking” (in: W. G. Kropatsch and H. Bischof [editors], Mustererkennung 1994, Proceedings of the 16th Symposium of the DAGM and the 18th Workshop of the OAGM, Springer-Verlag, 1994, incorporated herein by reference) by C. Goerick and M. Brauckmann.
Besides these approaches mentioned above, image representations based on Gabor functions (GFs) and/or other similar wavelets have shown to be very useful in many applications such as image coding and compression, enhancement and restoration, or analysis of texture. Moreover, GFs are frequently used in the scope of multi-scale filtering schemes, e.g. in current models of image representation in the visual cortex as they offer a good approximation to the receptive fields of simple cortical cells. however, GFs are not orthogonal and, as a consequence, the classic Gabor expansion is computationally expensive as GFs are based on unusual dual basis functions. Said reconstruction requires the use of iterative algorithms, Artificial Neural Networks (ANNs), or the inversion of large matrices. These problems can partially be overcome by using a redundant, multi-scale filtering implementation. Among the many wavelet, multi-resolution pyramids and related schemes using different basis functions (such as Gaussian derivatives, steerable filters, etc.), those based on GFs involve several advantages:                maximization of joint localization in both spatial and frequency domain,        flexibility because GFs can freely be tuned to a continuum of spatial positions, frequencies and orientations, using arbitrary bandwidths,        the fact that GFs are the only biologically plausible filters with orientation selectivity that can exactly be expressed as a sum of only two separable filters, and        their good performance in a large variety of applications.        
For all these reasons, Gabor functions are especially suitable for performing early processing tasks in multipurpose environments of image analysis and machine vision.
In the article “Entropie als Maβ des lokalen Informationsgehalts in Bildern zur Realisierung einer Aufinerksamkeitssteuerung” (Internal Report 96-0.7, Institut fur Neuroinformatik der Ruhr-Universitat Bochum, 1996, published in: Mustererkennung 1996, pp. 627-634, Springer-Verlag, Berlin/Heidelberg, 1996, incorporated herein by reference) by T. Kalinke and W. von Seelen, an attention control system providing an image segmentation for a non-specific object recognition is disclosed. Based on the information theory introduced by C. Shannon, the local information content of digital images is estimated. Thereby, image entropy is used as a measure for the expected information content of an image part. In this connection, different parameters such as mask size, subsampling factor, entropy threshold and specific parameters of morphological operators allow both problem- and task specific image processing.
In spite of many attempts to construct optimal object recognition systems (e.g. based on edge detection), it can be shown that the known algorithms often have problems in object segmentation at locations where lines and edges are very close and/or intersect. Since conventional edge detection algorithms are only capable of recognizing a plurality of very small (simply- or multiply-connected) image patches, it is impossible to resolve local ambiguities, e.g. caused by crossing lines. Consequently, the underlying recognition system is not able to distinguish between many small objects on a cluttered background and one large object consisting of different parts that belong together. For this reason, global information about the contours of an object that has to be recognized is required. In general, these problems occur in case of images that contain many different objects or in case of objects on a cluttered background.
Another problem is that these object recognition systems have to be adjusted to the used image or image family. Moreover, there are still very few algorithms which are able to detect and classify lines and edges (events) simultaneously.