1. Field of the Invention
This invention relates to an image reading apparatus, and more particularly to an image reading apparatus to read an image which contains character information and to easily determine the language of the original based from the characteristic of the image information of the original.
2. Description of the Related Art
When characters are recognized in the image reading apparatus using an OCR (optical character reader), etc., the OCR engine is necessarily used for each language of the characters contained in the original to be read. To attain this, the language of the character in the original to be read is manually set in advance by the user of the apparatus before a read. Therefore, the user has the problem that the apparatus is not easily operable.
To solve the problem, an apparatus for automatically determining a language has been suggested by loading OCR engines for a plurality of languages on the OCR, actually recognizing the characters in an original using the OCR engine for each language, selecting the language having the highest probability of correct determination (for example, patent document #1: Japanese Patent Application Laid-Open No. 6-150061).
This is as shown in FIG. 7. That is, an image input device 50 reads a character written in an original as image data, and transmits the data to a character recognition process unit 51. The character recognition process unit 51 has a plurality of OCR engines for a plurality of languages for character recognition. For example, as shown in FIG. 7, a Japanese OCR engine 52 performs a pattern matching process between the received image data and the character pattern in a Japanese character pattern dictionary 53, thereby executing character recognition. Then, an English OCR engine 54 performs a pattern matching process between the same image data and the character pattern in an English character pattern dictionary 55 to recognize characters. From the result of the character recognition by the OCR engine of each language, the probability of correct determination indicating the probability of correct recognition is obtained, and is transmitted to a language determining process unit 56. Based on the probability of correct recognition transmitted from the OCR engines for a plurality of languages, the language determining process unit 56 determines the language of the highest probability of correct recognition as the language of the characters contained in the original.
In addition, to prevent wrong determination, a similar determining process is performed on a plurality of characters contained in the original, and a statistical process is also performed to determine the language of the highest probability of the language of the characters contained in the original as the language of the characters contained in the original.
However, in such apparatus, since character recognition and determination are repeatedly performed by the a plurality of OCR engines for a plurality of languages, the process requires a long processing time.
Furthermore, the above-mentioned language determining process is to be performed as a preprocess for character recognition. Then, it is desired to realize the function by hardware to complete the process within a shortest possible time. However, it is very hard to realize the function of the plurality of OCR engines for a plurality of languages and the character pattern dictionaries for the languages by hardware.
As described above, the conventional technology has the following problem. That is, when an image reading apparatus recognizes a character using an OCR, etc., it is necessary to use an OCR engine specialized for the language to be recognized or used in an original. Therefore, a user manually sets the language of the original when an image is read, so that it is troublesome for the user to operate such apparatus.
To solve the problem, there has been an apparatus suggested for automatically determining the language of the characters contained in an original by loading a plurality of OCR engines for a plurality of languages, performing character recognition by the OCR engines for the respective languages, and selecting a language of the highest probability of correct determination, thereby automatically determining the language of the characters contained in the original.
However, the apparatus has to actually perform character recognition by the plurality of OCR engines for the plurality of languages each time an original is read, and requires a long processing time. Furthermore, to shorten the processing time, it is preferable to realize the function by hardware, but it is hard to realize the OCR capability by hardware.