Recently, the use of electronic data has been increased especially in an office environment. For example, instead of keeping a document in the paper form, the document may be converted to electronic data using an image processing apparatus, and stored in the form of electronic data. In order to retrieve a desired document from a large number of stored documents, a user usually requests an information retrieval system to search through the stored electronic documents. In order to improve the office work efficiency, there is a need for the information retrieval system capable of locating the desired document with less time and with high accuracy.
One technique of retrieving the desired document is to search through the stored documents for one or more documents that matches a keyword input by the user and to provide a list of the documents that matches the keyword, for example, as described in the Japanese Patent Application No. 2004-348591. This text search technique however requires the use of an optical character reader (OCR) as it is necessary to convert information contained in the electronic document to text data. Further, this technique may require the user to additionally provide information regarding the electronic document when storing or searching the electronic document, such as information regarding the language used in the electronic document.
The Japanese Patent Application Publication No. 2003-281181 describes a technique of retrieving the desired document, which does not require the use of OCR or information regarding the language used in the document. To locate the desired document, the technique described in the Japanese Patent Application Publication No. 2003-281181 converts a keyword input by the user to a set of symbols, and searches through the stored documents for one or more electronic documents each having a set of symbols that matches the set of symbols converted from the keyword. This technique may not be practical in terms of the time that may be required for searching, especially when the number of stored documents is large.