ALPR is an image-processing approach that often functions as the core module of “intelligent” transportation infrastructure applications. License plate recognition techniques, such as ALPR, can be employed to identify a vehicle by automatically reading a license plate utilizing image processing, computer vision, and character recognition technologies. A license plate recognition operation can be performed by locating a license plate in an image, segmenting the characters in the captured image of the plate, and performing an OCR (Optical Character Recognition) operation with respect to the characters identified.
The ALPR problem is often decomposed into a sequence of image processing operations: locating the sub-image containing the license plate (i.e., plate localization), extracting images of individual characters (i.e., segmentation), and performing optical character recognition (OCR) on these character images. In order for OCR to achieve high accuracy, it is necessary to obtain properly segmented characters.
The ability to extract license plate information from images and/or videos is fundamental to the many transportation business. Having an ALPR solution can provide significant improvements for the efficiency and throughput for a number of transportation related business processes.
ALPR systems have been successfully rolled out in several U.S. States (e.g., CA, NY, etc.). Some ALPR modules involve training classifiers for character recognition, and are commonly employed after detecting a license plate in a license plate image and segmenting out the characters from the localized plate region. A classifier can be trained for each character in a one-vs-all fashion using samples collected from the site, wherein the collected samples are manually labeled by an operator. Considering the high accuracy (i.e., 99%) required for the overall recognition system, the classifiers are typically trained using on the order of ˜1000 manually labeled samples per character. The substantial time and effort required for manual annotation of training images can result in excessive operational cost and overhead.
In order to address this problem, some techniques have been proposed for training classifiers based on synthetically generated samples. Instead of collecting samples from the site, training images are synthetically generated using the font and layout of a license plate of the State of interest. Examples of such approaches are disclosed in, for example: (1) H. Hoessler et al. “Classifier Training Based on Synthetically Generated Samples”, Proc. 5th International Conference on Computer Vision Systems, 2007; and (2) Bala, Raja, et al. “Image Simulation for Automatic License Plate Recognition,” IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, 2012, which are incorporated herein by reference.
FIG. 1, for example, depicts a block diagram of a prior art license plate synthesis workflow 10. The workflow 10 depicted in FIG. 1 includes a test overlay module operation or module 14, along with a character sequence generation operation of module 16. Data indicative of a blank license plate image 12 can be provided for text overlay via the text overlay module 14. Rendering effects (e.g., fonts, spacing, layout, shadow/emboss, etc) are also provided to the text overlay module 14. State rules for valid sequences can also be provided to the character sequence generation module 16. Example character sequences are shown at block 18 with respect to the character sequence generation operation 16. Output from the text overlap module 14 results in one or more license plates 19 with license plate numbers corresponding to, for example, the characters shown at block 18. An image distortion model that includes color-to-IR conversion, image noise, brightness, geometric distortions, etc., can also be fitted on synthesized images to mimic the impact of capturing vehicle plate images with a real video camera system.
While these methods eliminate much of the manual investment required for training, they usually result in deterioration in the classification accuracy. FIG. 2, for example, depicts a prior art graph 20 of accuracy-yield curves for classifiers trained using only synthetic and real images. Even though 2000 synthetic images are used per character in training, the accuracy at the same yield is significantly lower compared to classifiers trained with 1500 real samples per character.
When classifiers are trained by mixing synthetically generated images with real samples, the classification accuracy is recovered as shown in the prior art graph 30 depicted in FIG. 3. The mixing proportion and the number of real samples needed, however, change from site to site and are currently manually tuned by testing classifiers on a set of real samples. This manual process requires time and effort to both collect and annotate the real samples. The time required to gather a sufficient number of images along with annotation can delay the deployment of the automated solution for several months.
Up to this point, the performance of an OCR engine trained with only synthetic characters has been noticeably poor compared to one trained with real characters as shown in graph 20 of FIG. 2. The performance gap can be closed by testing on real examples, determining which labels are performing poorly, and updating the OCR classifier for these poor performing labels to better match the real world observation. Performance is further improved by supplementing the synthetic character images with real examples such as that shown in graph 30 of FIG. 3.
In order to support this iterative legacy process, thousands of real characters are typically required. For perspective, a well-trained OCR engine for a particular state with 36 labels requires ˜54,000 real characters (1500 samples per label). For a mixed synthetic and real scenario, we need 100 samples per label or 3,600 characters. With a targeted method, as outlined in greater detail herein, we can reduce this number to ˜300 examples.
The number of real world images that must be collected is typically much larger than the proportional number of character examples due to a non-uniform distribution of label appearance probability. This amplifies the discrepancy between the results achieved by the disclosed approach versus the baseline. A typical license plate has 7 characters so give a uniform distribution of appearance probability, we'd need 514 plates to obtain 3,600 characters. The actual distribution, however, is depicted in the example prior art graph 40 of FIG. 4, which plots probability data versus character label information. One would thus need to collect 6× the number of images to obtain 100 ‘X’ examples compared to ‘1’ example.