A conventional document filing apparatus has been developed for converting information of a document image including characters and pictures into an electronic form so as to record it as document image data. In recent years, with an increase in document image data to be recorded in such document filing apparatus, much effort is required to input or update key words and class codes for retrieving a document image when recording it in the filing apparatus.
In order to reduce the effort to input or update the data for retrieval when recording the document image, a recent document tiling apparatus includes a document image storage means in which document image data which is obtained by converting a document image into electronic form with a scanner or the like is associated with character data which is obtained by recognizing the document image as characters (refer to Japanese Patent Gazette No.2560656 (Japanese Published Patent Application No. Hei.8-87528)).
FIG. 39 is a block diagram for explaining an example of a conventional document filing apparatus.
A conventional document filing apparatus 3900 includes an image coding means 3902 for coding binary image data (document image data) Di, which is obtained by converting a document image into electronic form and which is externally supplied, by using a scheme such as MH (Modified Huffman) or MR (Modified Read), and outputting coded data De corresponding to the document image. The document filing apparatus 3900 also includes a character recognition means 3901 for subjecting the document image data Di to character recognition, and outputting, as character data, character codes Dco of plural candidate characters for each character which is included in the document image. In the character recognition process, the character recognition means 3901 employs, for example, a method of pattern recognition by an OCR (Optical Character Reader).
The document filing apparatus 3900 further includes a document image storage means 3903 in which coded data De corresponding to each document image is associated with character codes Dco (i.e., character codes of plural candidate characters relating to the document image).
The document filing apparatus 3900 also includes a data reading means 3904 for reading coded data De corresponding to a specific document image stored in the document image storage means 3903, based on character codes which are externally supplied as retrieval data Da. The document filing apparatus 3900 further includes an image decoding means 3905 for decoding the read coded data De so as to restore the document image data Di corresponding to the specific document image. The data reading means 3904 collates the character codes as the retrieval data Da (retrieval character code) with the character codes stored in the document image storage means 3903 (stored character codes), and outputs coded data De of a document image corresponding to the stored character codes which match the retrieval character codes.
In the document filing apparatus 3900, the character recognition means 3901 is constructed so as to output character codes Dco of plural candidate characters as character data which are obtained by performing character recognition on each character, whereby an adverse effect of errors in the character recognition on the retrieval is reduced.
When document image data Di is input to the document filing apparatus 3900 so constructed, the image coding means 3902 encodes the document image data Di to output coded data De. The character recognition means 3901 extracts a character image which is included in the document image based on the document image data Di, and outputs character codes Dco of plural candidate characters corresponding to this character image.
Then, the coded data De corresponding to one document image are associated with the plural character codes Dco to be stored in the document image storage means 3903.
Further, when the retrieval data Da is externally supplied, the data reading means 3904 reads coded data De corresponding to a predetermined document image which is stored in the document image storage means 3900, based on the character codes as the retrieval data Da, and the image decoding means 3905 decodes the coded data De so as to restore the document image data Di.
In the conventional document filing apparatus 3900 so constructed, however, the coding process on the document image data Di by the image coding means 3902 is performed uniformly regardless of the types of characters included in the document image or regardless of the types of components of the document image such as characters, diagrams, pictures, etc., and therefore, the coding efficiency is degraded in some instances.
Further, in the conventional document filing apparatus 3900, the character recognition means 3901 performs character recognition on each character which is included in the document image, and outputs character codes of plural candidate characters for each character which is included in the document image. However, the plural candidate characters which are obtained as the result off character recognition for one character usually have a tendency to be similar in shape. In other words, when one candidate character (usually, a first candidate character) is recognized, other candidate characters can be roughly analogized. Therefore, to derive plural candidate character codes by performing character recognition on each character results in obtaining redundant character data, whereby the quantity of data is increased.
The present invention is made to solve the above-described problems. Accordingly, an object of the present invention is to provide an image coding apparatus which can realize a coding process for efficiently coding data of a document image including characters without degrading facility in retrieval of a character image that is included in the document image, an image decoding apparatus which can preferably perform a decoding process that is adapted to the coding process, and a data storage medium containing a program for making a computer perform the coding process and the decoding process.
Further, it is another object of the present invention to provide a document collation apparatus which can perform a collation process of collating a character image code which is obtained by coding image data corresponding to a character image (character part of a document image) with input character data without decoding the character image code, and a data storage medium containing a program for making a computer perform the collation process.