A paper sheet such as a banknote, a check, a draft, and a gift coupon has a character string printed thereon as an identification number to identify the paper sheet. For a banknote, this identification number is called a serial number. If, for example, a counterfeit banknote is found, a bank can use this serial number to check whether this counterfeit banknote was processed in a transaction performed within the bank. There is a need, for financial institutions and the like, to automatically perform character recognition on serial numbers and enter recognized serial numbers into a database in order to manage banknotes in transactions.
To build the database of banknotes from their serial numbers, character recognition needs to be accurately performed on the serial numbers that are to be entered as data. The character recognition is performed by using an image of a serial number extracted from a banknote image obtained by capturing a banknote. To achieve the character recognition, the position of a character string, which is the serial number, needs to be identified on the banknote image to accurately extract the serial number image. As a technique of extracting a character string from an image, Patent Document 1 discloses a method of extracting only a character string image by distinguishing the character string from its background based on color information. More specifically, a color image of a banknote is used, and pixels indicating colors of the background are removed from an image of a region containing the character string to extract the character string image.
Patent Document 2 discloses a method of extracting an image of a character string by using differences among densities of characters and their background. More specifically, this method includes generating a histogram by projecting densities of an image containing the character string, and extracting a region having a density projection value that exceeds a predetermined threshold value. This threshold value is adjusted such that the width of the extracted region coincides permissibly with a predetermined width of a character string image. This allows separation of the image of the serial number from its background or stains.
Patent Document 3 discloses a method of extracting individual characters, one by one, in a character string by scanning the character string through a character extraction window and detecting the position of each character. More specifically, the character extraction window has a central region that matches the size of the smallest one of the characters in the character string, a strip-shaped surrounding region that surrounds the central region and matches the size of the largest character, and a strip-shaped background region that surrounds the surrounding region taking into account the spaces between characters. The character extraction window is moved on the binarized character string image. When the total numbers of pixels for characters in the background region and the surrounding region satisfies a predetermined condition, and the length of the projected characters in the central region satisfies a predetermined value, this position is detected as a position of a character and a character image of each character is extracted from the detected position.