Alphanumeric recognition systems are used in a variety of applications to read information with a machine rather than a pair of human eyes, thereby automating the process and increasing the efficiency of the throughput. Examples of machine-readable information include names or addresses written on mailpieces such as envelopes, information written on forms, such as tax forms and the census, product codes or part numbers inscribed on objects, and automobile VIN numbers.
OCR systems typically are used to sense, or “read,” information under severe time constraints, such as when information must be read, interpreted, validated, updated, and/or processed within a short period of time. Working against this time constraint is the interpretation process, performed by character recognition schemes that resolve ambiguous patterns sensed by the OCR. For example, the character “D” could be read as an “O” or a “C,” while a “G” could be read as a “C” or an “E.” As a result, hundreds of permutations or combinations of characters may be cross-referenced against a known word or number using a character recognition scheme before the correct permutation is recognized.
A known character recognition scheme used with OCRs In the United States Postal Service (“USPS”) to read names and addresses on mailpieces, such as letters, magazines, and parcels, is the DiGram scheme (Di=twice, Gram=letter). Using the DiGram scheme, the OCR interprets each character in the mail name (or mail address) as a letter (or number) and two alternate possibilities, creates a matrix of the possibilities, and generates permutations from the matrix. For a five-letter word, the matrix would be 3×5, and the number of permutations would be 35, or 243, possible combinations of letters. The DiGram scheme determines the “valid” name or address, using a data recognition logic algorithm, by comparing the permutations to a known name or address with the same or similar character string.
For example, the DiGram scheme has been used to identify mailpieces requiring a change of address (“CoA”). About 500 to 700 of the approximately 40,000 pieces of mail per hour that pass through an OCR system require the application of a new address based on information from CoA forms. In the context of a mailpiece undergoing a match with a CoA form submitted by a postal patron, the mailpiece is read by the OCR, the mail name is split from the mail address, and the characters of the mail name are evaluated and matched against a known string of characters. This process must be completed within the time it takes to send the mail address to a ZIP+4 address matching engine for “standardization.” Such standardization involves confirming that the city, state, and zip code in the mail address correspond to one another by checking them against a ZIP+4 database containing all corresponding city, state, and zip code information.
After the OCR reads and separates the mail name and mail address, it creates permutations of the mail name based on the DiGram scheme. A CoA form, which is the “known” data, contains the postal patron's name, old address, and new address. The permutations of the mail name are checked against the known CoA name or names associated with the mail address.
The DiGram scheme determines the correct mail name through a data recognition logic algorithm by comparing the permutations, which are generated from the three possible characters for each letter in the mail name, to the known CoA names for that particular mail address. If there is a match between the known name, a mail name permutation, and-the old mail address, the USPS can assume that the mail belongs to a particular postal patron and forward it to the new address as requested in the CoA. The new address and corresponding barcode are then “sprayed,” i.e., printed, on the mailpiece. If there is no match, the USPS can assume that the mail address written on the mailpiece by the sender is correct, and a barcode corresponding to the mail address written on the mailpiece is sprayed on the mailpiece.
The DiGram scheme requires a comparison of the known name against hundreds of permutations created by the DiGram scheme to recognize the correct name. Alphanumeric recognition could be more efficient if the method were accelerated by reducing the number of comparisons made.