1. Field of the Invention
The present invention relates to an image processing apparatus, method, and program for extracting a character string from an image.
2. Description of the Related Art
When a moving image is captured with a digital camera or a digital camcorder, a signboard that has a width or height so long that even an image captured at a wide-angle cannot fit in a single screen is captured while panning or tilting. When capturing an image of an electronic billboard or a display in which characters are scrolled, the camera is fixed so that the characters move across the imaging screen. In addition, on a television broadcast screen, a telop character string scrolls inside the screen. In whichever case, the whole character string does not fit in a single screen, so that in order to grasp the whole character string, a character string that appears over a plurality of frames has to be extracted.
Japanese Patent No. 2989364 discusses a technique for composing a whole image by combining a plurality of images. If the whole image can be obtained, according to the conventional art, character recognition can also be performed from the combined whole image.
Japanese Patent No. 2858560 discusses a technique for capturing a moving image of an object on which characters are written that do not fit on a single screen. In the character recognition, character recognition is performed on each frame image, and those character recognition results are combined.
The invention discussed in Japanese Patent No. 2989364 is effective when capturing an image of a sign having a long width or height while panning or tilting. However, when an image of an electronic billboard or display in which characters are scrolled is captured, a plurality of images is combined to make the backgrounds other than the electronic billboard or the display match. Therefore, regarding the electronic billboard or display content, a plurality of scrolled characters overlaps each other, so that even if the electronic billboard or display content is extracted, the characters cannot be read. Consequently, character recognition is also impossible. Regarding a television broadcast telop, the background image also moves independently of the movement of the telop display portion. Thus, if the backgrounds are matched, similar to an electronic billboard or display, a plurality of scrolled characters overlap each other, and if the telop display portion is extracted, the characters cannot be read. Consequently, character recognition is also impossible.
In addition, if an image is large, a huge amount of calculation time is required to combine the images, and a high-capacity memory is needed to store the whole image generated by the image combination. When character recognition is performed in the whole image, since the whole image is large, a huge amount of calculation time is required to extract the character area to be subjected to the character recognition.
According to the invention discussed in Japanese Patent No. 2858560, the character recognition result of each frame image may vary due to light fluctuation and camera shake. To increase the reliability of those results, errors are corrected by defining a similarity distance between character codes, and a word dictionary is used. If there is a word that is not listed in the dictionary (a new word or a made-up word), the combining process may result in failure. Especially on signboards and electronic billboards, shop names that use phonetic equivalent characters may be displayed, which can cause mistakes in recognition.