The exemplary embodiment relates to text recognition in images. It finds particular application in connection with recognizing license plates and will be described with particular reference thereto. However, it is to be appreciated that it is applicable to a wide range of recognition problems.
Text recognition in images involves both recognizing that a portion of the image is an image of text and also recognizing the character sequence which constitutes the text. There are many instances where it is desirable to recognize text in images, for example, recognition of license plate numbers in images of vehicles, recognition of street names on images of street scenes, and the like. It may also be desirable to recognize different types of text (e.g., typed text vs. handwritten text) and to recognize different types of images (e.g., natural images vs. document images).
Recognition of license plate information assists in vehicle recognition, since, in general, the license plate is a unique identifier for the vehicle on which it is mounted. In the past, this problem has been traditionally addressed by applying Optical Character Recognition (OCR) on the license plate image. See, for example, Anagnostopoulos, et al., “License plate recognition from still images and video sequences: A survey,” IEEE Trans. on Intelligent Transportation Systems, vol. 9, No. 3, pp. 377-391, 2008, hereinafter “Anagnostopoulos”). However, OCR recognition can be computationally expensive and accuracy diminishes when the visibility at the time of capturing the image is poor.
A recent solution has been to address recognition as an image matching problem, as disclosed, for example, in copending U.S. application Ser. No. 13/300,124. Given an image of a license plate (the query), the license plate number of the closest matching images in a large database are retrieved. The images to be compared are each represented by an image signature, which is a statistical representation of an image, derived from low-level features extracted from the image. As image signatures, Fisher Vectors can be used. See, for example. Perronnin, et al., “Improving the Fisher kernel for large-scale image classification,” in ECCV, 2010.
The signature comparison method assumes that at least one example of the query is already present in the database. While this is often not an issue in some applications (for example, in the context of a parking application where an image taken at the exit is being matched to images taken at the entry), there are many instances where such a database is not available or is incomplete. One way that this could be addressed is by generating artificial license plates. For example, U.S. application Ser. No. 13/224,373 discloses a method for creation of virtual license plates by combining similar license plates. U.S. application Ser. Nos. 13/300,124 and 13/458,464 disclose methods for synthesizing license plate images.
The first of these methods is focused on retrieval and yields good results in terms of accuracy when the goal is to ensure that the license plate will likely be among the most similar retrieved images (e.g., among the top 20). This is generally sufficient for manually assisted search applications, but can pose problems for recognition, where usually a high top-1 accuracy is desired, i.e., it is desired to identify a single match with a high degree of accuracy, where a match is actually present. The second method can generate photo-realistic images of license plates from a given sequence of characters. However, it relies on a certain prior knowledge of the domain of application (e.g., license plate background, font, and the like). Additionally, multiple images are typically generated with different transformations to account for a set of representative plate distortions, which can be computationally expensive.
There remains a need for a system and method for recognizing text in images which is both sufficiently accurate for the particular application and is computationally efficient.