LPR (License Plate Recognition) or ALPR (Automatic License Plate Recognition) is a computer vision technology that typically includes image-processing operations with functions as the core module of “intelligent” transportation infrastructure applications. License plate recognition techniques, such as ALPR, can be employed to identify a vehicle by automatically reading a license plate utilizing image processing and character recognition technologies. A license plate recognition operations can be performed by locating a license plate in an image, segmenting the characters in the captured image of the plate, and performing an OCR (Optical Character Recognition) operation with respect to the characters identified.
The ALPR problem is often decomposed into a sequence of image processing operations: locating the sub-image containing the license plate (i.e., plate localization), extracting images of individual characters (i.e., segmentation), and performing optical character recognition (OCR) on these character images. Thus, LPR and ALPR technologies involve the problem not just of object recognition, but also text image recognition.
One of the problems with license plate image recognition is that given a cropped image of a license plate, we are interested in producing its transcription. Two main trends exist to address the problem of license plate/text image recognition.
The first trend is based on the aforementioned OCR, which is inspired by traditional word recognition methods in documents. Given a word image, individual characters of the word can first be localized and then recognized via a number of approaches. Although these techniques can obtain very good recognition results, they are not exempt of problems. For example, millions of training words need to be annotated with character bounding boxes to achieve a high accuracy, and the individual characters in the word localized—which is slow and error prone, particularly in the case of license plate recognition, where even the license plate itself may not have been localized and cropped with high accuracy.
The second trend, inspired by recent computer vision techniques, describes word images with global signatures (e.g., bag of words or Fisher vector encodings on top of SIFT or other learned local features) without explicitly detecting the individual characters. With such approaches, one can simultaneously embed word images and text strings in a common space with an associated similarity metric, which allows one to cast the recognition of a word image as a retrieval problem: given a word image, one can rank all possible transcriptions (e.g., the lexicon) and utilize the most similar one to the image word as the predicted transcription.
Although this offers advantages in many domains, for some particular tasks such as license plate recognition, where the number of possible transcriptions is vast, this is not practical, and it is of paramount importance to perform recognition without a known lexicon, a much more difficult task. Some techniques utilize global image signatures, but cast the problem not as retrieval but as an optimization, attempting to find the transcription that maximizes the compatibility function. This method has obtained very good results on internal license plate datasets with no need of a lexicon, although the results were, as expected, not as accurate and efficient as when using a lexicon. In a similar direction, a Convolutional Neural Network can be trained, which learns how to map images of text into a text embedding space from where the actual text string can be easily recovered. In practice, this allows one to perform text image, classification if huge amounts of labeled training data are available, but this is typically not the case, particularly with license plate images.
Given the importance of license plate recognition for different areas of transportation, a solution to recognize license plates with no lexicon in a more accurate and efficient manner that does not require vast amounts of annotated training data is sought.