The exemplary embodiment relates to text recognition in images. It finds particular application in connection with verification of OCR output, such as in recognizing license plates, and will be described with particular reference thereto. However, it is to be appreciated that it is applicable to a wide range of recognition problems.
Optical Character Recognition (OCR) refers to the process of recognizing the text present in an image, i.e. converting the pixels of the image into the actual string of text appearing in the image. There has been considerable work on OCR in the document analysis domain and other application fields such as license plate recognition, where the license plate is generally a unique identifier for the vehicle on which it is mounted. See, for example, Anagnostopoulos, et al., “License plate recognition from still images and video sequences: A survey,” IEEE Trans. on Intelligent Transportation Systems, vol. 9, No. 3, pp. 377-391, 2008.
OCR systems typically operate by segmenting a full image into sub-images corresponding to words and then recognizing the word in each individual sub-image. It is this latter stage of processing the sub-image that is the focus of this application. Such a sub-image containing a character string such as a single word is referred to herein as a “text image.” In the case of license plates, the sub-image may be a cropped image of a vehicle where the license plate number has been located. The desired output is a text string corresponding to the word or other character string that is present in the text image. However, OCR recognition is not completely accurate for a number of reasons. For example, accuracy diminishes when the visibility at the time of capturing the image is poor. Additionally, there may be characters in the image which are not accepted by the OCR system. In practical applications, therefore, it is common for the OCR system to output a confidence score for the recognized string. The confidence score is a self-assessment by the OCR algorithm of the reliability of the output. The computation of the OCR confidence score depends on the internal recognition process. As one example, for a probabilistic model, it may be the posterior probability of the output text string given the image data. For a system based on individual character detection and classification, it may be the arithmetic or geometric average of the individual character classification scores.
In some applications of OCR such as in license plate recognition, an OCR error may carry a high cost (e.g., result in billing an incorrect customer for toll charges). In such an application, the confidence score is used to trigger a “reject” decision, typically by discarding OCR decisions where the confidence is a below a threshold. In case of a rejection, the text image can be sent to a second automatic stage (for example, a second OCR) or to a human annotator for manual review. In practice, the OCR confidence score does not always provide a reliable prediction of whether the OCR output matches the ground truth and thus the confidence score may result in the rejection of OCR decisions that are correct or the acceptance of OCR decisions that are incorrect, depending, in part, on where the threshold is placed. Even in the case where two OCRs are used, since access to the internal mechanics of each OCR system is not generally made available to users, a second OCR may not overcome the deficiencies of the first OCR. Extensive use of human annotators can be costly and time consuming.
There remains a need for a reliable system and method for computing a confidence in the output of a text recognition system which need not rely on access to its internal recognition process.