Detecting objects in images is a fundamental problem in computer vision. Object detection is a necessary first step before the objects can be tracked or identified.
One broad class of object detection techniques constructs a model of the object, and pixels in images are compared to the model. One problem in object detection is that the object can have various poses with respect to the image plane, e.g., rotational variation. Therefore, it is desired to have a model that is invariant to the poses of the object
In the following description, the example object to be detected is a vehicle license plate. However, it should be understood that the invention can be worked many different types of objects, e.g., human faces, vehicles, and the like.
Methods for detecting license plates are well known: L. Dlagnekov, “Car license plate, make, and model recognition,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2005; W. Förstner and B. Moonen, “A metric for covariance matrices,” Technical report, Dept. of Geodesy and Geoinformatics, Stuttgart University, 1999; and C. Rahman, W. Badawy, and A. Radmanesh, “A real time vehicle license plate recognition system,” Proceedings of the IEEE on Advanced Video and Signal Based Surveillance, AVSS03, 2003.
License plate detection can be used in security and traffic monitoring systems. Often the extracted information is used for enforcement, access-control, and flow management, e.g., to keep a time record on the entry and exit for automatic payment calculations or to reduce crime and fraud.
However, detecting license plates with varying poses, changing lighting conditions and corrupting image noise, without using an external illumination source is difficult.
Methods can be rule based or use a trained classifier. A simple method first detects license plate boundaries. An input image is processed to amplify the edge information using gradient filtering and thresholding. Then, a Hough transformation is applied to detect parallel line segments. Coupled parallel lines are considered as license plate candidates.
Another method uses a gray level morphology. That method focuses on local appearance properties of license plate regions such as brightness, symmetry, orientation, etc. Candidate regions are compared with a given license plate image based on the similarity of these properties.
Classifier based methods learn different representations of the license plates. In a color texture based method, a license plate region is assumed to have discriminatory texture properties A support vector machine (SVM) classifier is used to determine whether a candidate region corresponds to a license plate or not. Only the template of a region is fed directly to the SVM to decrease the dimensionality of the representation. Next, license plate regions are identified by applying a continuously adaptive mean-shift process to the results of the color texture analysis.
Another method poses the detection task as a boosting problem. Over several iterations, an AdaBoost classifier selects a best performing weak classifier from a set of weak ones, each classifier acting on a single feature, and, once trained, combines their respective votes in a weighted manner. A strong classifier is then applied to sub-regions of an image being scanned for likely license plate locations. An optimization based on a cascade of classifiers, each specifically designed using false positive and false negative rates, helps to accelerate the scanning process.
In addition to single frame detection techniques, there exist methods that take advantage of video data by processing multiple consecutive frames.
One main drawback of all the above methods is that their performance depends on strong assumptions made about the appearance of objects. Most methods cannot handle in-plane and out-plane rotations, or are incapable of compensating for imaging noise and illumination changes. Simply enlarging the training dataset using rotated training samples often causes a deteriorated performance and increased false positive rate.
The region descriptors used for object detection should be invariant to rotations. The descriptors should also be able to distinguishing objects from the background under uncontrolled conditions.
Many different representations, including aggregated statistics, textons, and appearance models, have been used for object detection. Histograms are popular representations of nonparametric density. However, histograms disregard the spatial arrangement of the feature values. Moreover, histograms do not scale to higher dimensions. Appearance models are highly sensitive to the pose, scale and shape variations.