1. Field of the Invention
The present invention relates to an apparatus for coding a binary stationary image, and more specifically to an apparatus for one which extracts a pattern from a binary image in a system for pattern matching coding.
2. Description of Related Art
In the past, a method for coding of a binary stationary image using pattern matching has been known, this coding method being one in which an image is divided into patterns which are collections of continuously arranged black pixels, with matching being performed with respect to each pattern.
Then, in accordance with the results of this pattern matching, the bit maps for the patterns themselves and information which represents the pattern positions and sizes are coded.
In a binary stationary image coding method which uses pattern matching as described above, when extracting a pattern from an image, black pixels are detected by scanning along the image from the upper left part to the lower right part thereof.
Next, the outline contour of the collection of continuously arranged black pixels, is traced with a detected black pixel as the starting point to determine the contour of the pattern. Finally, the contents of this contour is extracted as the pattern.
When performing the above operations, absolute coordinates on the first appearing pattern are used as the reference in indicating the position of a pattern, with other patterns basically being expressed as a related distance (offset) from the immediately previously appearing pattern.
Considering the above-noted point, in horizontally written text, the offset values will be small, making the coding efficiency improved than the base in which all patterns are expressed in absolute coordinates.
It occurs that, in the binary stationary image coding method of the past, when extracting a pattern (for example, when extracting a pattern from Japanese-language text or Chinese language text), because of, for example, the characteristic of Japanese that it is made up of many kanji ideographic characters that are complex and that have a large number of strokes, there are cases in which a single character will have a plurality of patterns.
For example, with regard to the character "KAN" as shown in FIG. 5, as used in the word kanji itself, there are three patterns (1), (2) and (3) in the left "radical" part (A) of the character as shown in FIG. 5 and one part (4) in the right "tsukuri" part (B) of the character, as shown in FIG. 5, making a total of four patterns, resulting in a very large total number of patterns.
As a result, there are excess of data resulting from the need to express the pattern positions, sizes, and the results of pattern matching. This leads to the problem of reduced efficiency in coding.
Additionally, in the binary stationary image coding method of the past, when extracting a pattern, if noise is mixed in with the image so that even for one and same character the associated shape can be slightly different, even for one and the same character, there can be differences in the number of divided patterns and/or the shapes of the patterns.
For this reason, there is an increase in the number of types of patterns, this leading to the problem of a reduction in coding efficiency.
Another problem is that, when extracting a pattern as done in the past, because the offset is the distance to the immediately previously appearing pattern, that is, because the offset is expressed as the distance to the pattern positioned to the left of the pattern of interest, because of the scanning direction, for vertically written text, the spaces between lines are redundant. This leads to the problem of a reduction in coding efficiency.
An object of the present invention is to provide a pattern extraction apparatus that, in extracting a pattern using binary stationary image coding that uses pattern matching, is capable of reducing the number of patterns and number of pattern types, thereby improving the coding efficiency.
Another object of the present invention, is to provide a pattern extraction apparatus that is capable of efficient determination of offset with regard to vertically written text, similar to that of horizontally written text.
Note that, regarding documents which are written by English, German, French or the like, it is usually written in horizontal direction and thus only a writing direction for the row should be confirmed first. Thereafter such reduction of the number of patterns and improvement of the coding efficiency as mentioned above, are also required.