This invention relates to the field of character recognition, and particularly to a highly reliable system and method for recognizing ID-type characters under real-world conditions where factors such as painted character ID distortion, image sensors limitation and induced distortion, and environmentally-caused distortion may make a robust reading difficult.
The desirability of having a system to reliably recognize the characters of an alphanumeric identification code (hereinafter xe2x80x9cIDxe2x80x9d) is well appreciated by those skilled in the art. The ID may come in various forms. Some examples are license plate number, codes for freight shipment containers, serial numbers on IC chips, etc. For IDs such as the license plate number, it is more difficult to recognize the ID characters because the characters may be obscured or distorted due to many real-world conditions. For instance, the characters may be partially covered by the frame which holds the plate in place or simply dirt from the roads. Corrosion is another source of distortion. Furthermore, license plates are often located near other forms of writing, bumper sticker for instance, which may be mistaken for the license plate itself.
A number of systems have been developed for recognizing ID characters. For instance, the U.S. Pat. No. 4,817,166 describes an apparatus for reading a license plate. Here, a video camera produces an image of a license plate on a vehicle, and a scanning apparatus finds a license plate number in the image. The identification of the license plate number is verified in a confidence check section by checking for the presence of a state logo. Next, a character extractor identifies individual characters of the number by finding and tracing a contour along interior portions of the character edges, and the contour length, character height, and character width are then checked in a validity checking section to determine whether they are within predetermined acceptable ranges. To correct for obscuring objects on the license plate, a character alignment section determines the top line and baseline of the license plate number and disregards portions of the characters that appear to be outside of these limits, and a license plate frame correction section is utilized to extract the characters when a portion thereof is obscured by a license plate frame. Once extracted, the characters are recognized by a structural analysis section and a state recognition section recognizes the state logo. Once the state is identified, a verification section rereads the license plate number utilizing knowledge of the type style used by the identified state.
Another ID recognition system is described in GB 2,273,191 where an ID code on containers is verified against a target code representing the code of an expected container. An image of the container surface carrying the displayed code is obtained by several cameras and digitized to form a pixel array. The pixel array is scanned to detect potential characters which are grouped together. From the pixel values of the potential characters a set of recognized characters is produced and compared with the target code, the recognized code verified or rejected depending on the results of the comparison between the target code and the recognized code. Recognition is carried out by a neural network.
Although each of these and other similar systems have their advantages, the main shortcoming in the ID recognition systems in general was their inability to achieve sufficient level of accuracy in reading the characters. Part of the difficulty which all of the systems faced was in handling the real-world conditions which may visually distort or obscure the ID characters. Because ID recognition is different than that of character recognition in that every character of an ID must be recognized before a valid ID is identified, even a relatively small percentage of misidentification of the characters can lead to a high percentage of misidentification of the IDs. To illustrate, a 99% character recognition rate is considered to be an excellent rate for character recognition systems; however, for ID recognition, 99% character recognition rate translate to a very poor ID recognition rate. Therefore, there is a need for a truly robust ID character recognition system that can negotiate wide range of real-world situations that may make an accurate reading difficult and which is particularly adapted for accurate identification at the ID level.
It is therefore an object of the present invention to provide a truly robust system and method for identifying characters of alphanumeric identification codes to achieve high accuracy even in situations where the individuals characters may be visually distorted due to real-world conditions.
It is further an object of the present invention to provide a system and method which takes particular measures to achieve high identification success rate at the ID level, and not just at the character level.
The present invention is a method and system for recognizing the characters on surfaces where alphanumeric identification code (xe2x80x9cIDxe2x80x9d for short) may be present. The present system is particularly adapted for situations where visual distortions can occur. Although the applications are many, in order to properly and fully describe the present invention, references shall be made to the particular application of recognizing characters on a license plate.
The license plates themselves, due to the environment in which they are exposed to, often become marked, dirty or dented. The presence of corrugations, structural bars, smear and other noise may distort the characters. Thus, the variation in character and background intensity and the lack of adequate contrast pose problems for a reliable method and apparatus for ID code character extraction, recognition and verification. The intensity and contrast of the characters and the background varies with respect to the illumination of the license plate surface in different conditions, such as during daylight, night time and cloud cover. Also, the characters of the ID code may be presented to the recognition apparatus at an angle, thus resulting in characters which are skewed.
The present invention utilizes a highly robust method for recognizing the characters of an ID. Multiple character recovery schemes are applied to account for a variety of conditions to ensure high accuracy in identifying the ID on the license plate. Accuracy is greatly enhanced by taking a comprehensive approach where multiple criteria are taken into consideration before any conclusions are drawn. Special considerations are given to recognizing the ID as a whole and not just the individual characters.
An image of the car in the general area where a license plate might be located is provided as input to the system. Various image enhancement tools are used to provide an image of the most optimal quality.
The potential regions of interest (or ROI for short), i.e., regions which may contain an image of the license plate, are detected using a broad predetermined criterion but no definite conclusions are drawn as to whether the region is actually one containing the license plate. Therefore, several regions may be selected as the possible license plate candidates. Various image enhancement tools of are used to detect the ROIs.
Because the characters may be black on white background or white on black background, the system assumes that both scenarios are possible. Hence, both scenarios are considered. For each of the scenarios, the character candidates are segmented from the detected ROI groups, and then recognized using recognition tools, which include various types of neural network systems. The recognized characters are then grouped to form potential character groups.
The potential character groups meeting certain criteria such as character height, width, etc., are then combined to form a possible ID, called likely ID character groups. At this stage, no definite conclusion as to the accuracy of the ID character groups is made. Still, the ID character groups have been partially recognized, and some information is gleaned during the process. Hence, these groups are now called partially-recognized ID character groups, or PR-IDCG for short. Because several license plate candidates may have been selected, it is possible that there may be more than one PR-IDCG or none at all, depending on the particularities of the image generated from an automobile.
All of the PR-IDCGs that have been extracted go through a series of character recovery schemes to recover any missing characters which may have been due to the fact that the original characters were distorted or obscured by various environmental or other real-world factors. After the recoveries have been made, the ID Candidates are formed, and they are tested for integrity and assigned a weighted value, ranked, and based on this ranking, the ID Candidate with the highest weighted value is selected for further processing.
It is then determined whether the ID selected has a sufficiently high confidence level. If such an ID is found, the system outputs the confidence value along with the ID. But if no ID of high confidence is found, then additional refinement recovery is performed until either a high confidence level is achieved or the system has gone through a pre-selected number of passes without a successful reading which will prompt a reject indicator as the output.