ALPR is a mature technology extensively employed in intelligent transportation systems for applications such as automated tolling, law enforcement, and parking management, among others. These systems typically include four modules: a) image acquisition, b) license plate localization, c) character segmentation (i.e., extracting images of each individual character in the license plate), and d) character recognition. A number of alternative methods, however, have been proposed for license plate recognition.
ALPR methods typically require an offline phase to train an OCR engine before deployment. In this offline phase, a classifier is trained for each character in a one-vs-all fashion using a set of manually annotated character samples. In order to match the distribution of training and target data sets, data collection and manual annotation is repeated for each country/state that font is different, and for each site that camera settings/configuration/geometry varies. Considering enormous variety of plate samples (i.e., variations in plate design, font, or layout), camera configuration, and geometries, manual annotation results in excessive operational cost and overhead and hence, poses an important challenge for the scalability of ALPR systems.
Efforts have been made to develop automated license plate recognition systems and some implementations have been successfully rolled out in some U.S. states (e.g., CA, NY, etc.). One module type employed in some automated license plate recognition systems includes training classifiers for character recognition, commonly employed after detecting a license plate in a license plate image and segmenting out the characters from the localized plate region.
A classifier can be trained for each character in a one vs. all fashion using samples collected from the site, wherein an operator manually labels the collected samples. Considering the high accuracy (i.e., 99%) required by our customers for the overall recognition system, the classifiers are typically trained using on the order of approximately 1000 manually labeled samples per character. The substantial time and effort required for manual annotation of training images can result in excessive operational costs and increased overhead. This problem is exacerbated for jurisdictions requiring multiple OCR engines (e.g., one for each of the most common states), as the annotation burden grows quickly (e.g., 36 symbols×1000 samples×6 jurisdictions=216 k samples to manually label).
In order to address this problem, some solutions have proposed training classifiers based on synthetically generated samples. Instead of collecting samples from the site, training images are synthetically generated using the font and layout of the State of interest. FIG. 1, for example, illustrates a block diagram of a plate synthesis workflow 1. In the prior art configuration shown in FIG. 1, a blank license plate image is shown, which is provided to a text overlay module 5. Rendering effects 13 (e.g., font, spacing, layout, shadow/emboss, etc.) are also provided to the text overlay module 5, along with output from a character sequence generation module 7. Examples of character sequence generation data are shown in box 9 to the right of the character sequence generation module. State rules 15 for valid sequences can also be provide as input to module 7. License plate images 11 are output from the text overlay module 5. An image distortion model, which includes color-to-IR conversion, image noise, brightness, geometric distortions, etc., can be also fitted on synthesized images to mimic the impact of capturing vehicle plate images with a real video camera system.
While such methods can eliminate manual interference required for training, they usually result in deterioration in the classification accuracy. FIG. 2, for example, illustrates a graph 2 of accuracy-yield curves for classifiers trained using only synthetic (green curve) and real images (red curve). In the example graph 2 of FIG. 2, even though 2000 synthetic images are used per character in training, the accuracy at the same yield is lower when classifiers are trained with 1500 real samples per character.
While these methods eliminate manual interference required for training, they usually result in deterioration in the classification accuracy. What is needed is a solution that minimizes manual annotation required for training classifiers while having minimal/no impact on the classification accuracy.