1. Field of the Invention
The present invention relates to a method of recognizing characters of a character string contained in data by detecting the layout of the character string on a document.
2. Description of the Related Art
Characters on documents have various kinds including Kanji characters, numerical characters, and alphabetical characters, and are available in different fonts including type and handwritten characters. In order to recognize these characters accurately, it is necessary to define the positions, kinds, and fonts of characters.
FIG. 31 of the accompanying drawings illustrates a document, and FIG. 32 of the accompanying drawings illustrates a conventional method of recognizing characters.
In FIG. 31, a money transfer request slip is shown as a document. The illustrated money transfer request slip is written by Kanji characters and numeric characters as shown in FIG. 31. The illustrated money transfer request slip has 29 character strings C1-C29. The transfer requester is xe2x80x9cAIU systemxe2x80x9d as indicated by the character string C2. The designated date of transfer is xe2x80x9cSeptember 20, Heisei 7xe2x80x9d as indicated by the character strings C3, C4.
Headers include a transfer destination (C5), an item (C6), an account number (C7), a receiver (C8), and a sum of money to be transferred (C9). Data corresponding to the header of the transfer destination include the character strings C10, C11, C16, c17, C22, C23. Data corresponding to the header of the item include the character strings C12, C18, C24. Data corresponding to the header of the account number include the character strings C13, C19, C25.
Data corresponding to the header of the receiver include the character strings C14, C20, C26. Data corresponding to the header of the sum of money to be transferred include the character strings C15, C21, C27. The money transfer request slip also has a header xe2x80x9ctotal to be transferredxe2x80x9d (C28) and its data (C29).
For recognizing the characters of the data on the money transfer request slip, it is necessary to define the positions and names of the data. If the kinds of the characters of the data are known, then it is possible to limit the range where the characters of the data are recognized, for character recognition of higher accuracy. To limit the range of character recognition, it is necessary to define a character category of the characters of the data and the king of the character font.
As shown in FIG. 32, the position, data name (transfer destination), the character category (Kanji), and the character font (type) are defined with respect to the character string C10, for example. Heretofore, it has been customary to generate, in advance, definition information which defines positions where characters are to be read, for each document, register the definition information in a recognition apparatus, read an image on a document according to the registered definition information, and recognize characters from the image.
Since definition information needs to be registered beforehand, however, characters can be recognized only for those documents with respect to which the definition information has been registered in advance. Banking organizations use various formats for money transfer request slips that are generated by corporations for automatically making money transfers. It is tedious and time-consuming to generate definition information for those documents in advance.
Even if definition information for documents is registered, the registered definition information should be changed when a document format is changed.
It is an object of the present invention to provide a method of recognizing characters without the need for generation, in advance, of definition information of characters on documents.
Another object of the present invention is to provide a method of recognizing characters by automatically detecting the layout of characters on a document from an arrangement of character strings on a document.
Still another object of the present invention is to provide a method of recognizing characters by automatically detecting definition information of characters on a document to recognize characters of data thereon.
According the present invention, a method of recognizing characters of headers and characters of data on a document, comprises the steps of extracting character strings on the document by reading the document, distinguishing between headers and data on the document by determining the positional relationship between the character strings, determining character attributes of the data by recognizing characters of the character strings of the headers using a header recognition dictionary, and recognizing characters of the character strings of the data according to the determined character attributes of the data.
In the method, headers are determined from the positional relationship between character strings, and using the header recognition dictionary which has been registered in advance, the headers are recognized, and character attributes of the data are determined. Finally, character strings of the data are recognized according to the character attributes.
Because headers and data on documents are automatically distinguished from each other to recognize header characters, character attributes of the data can automatically be determined. Since headers are universal in nature and characters used therefor are limited, the header characters can easily be recognized. Furthermore, inasmuch as characters of data are recognized depending on the character attribute that has been determined, the characters of data are recognized with increased accuracy.
Other features and advantages of the present invention will become readily apparent from the following description taken in conjunction with the accompanying drawings.