Japanese Laid-open Publication No. 7-152774, entitled “DOCUMENT SEARCHING METHOD AND DEVICE”, discloses a conventional technique which is known as a technique for searching document data, which is obtained by subjecting a document to character recognition, for data relevant to a designated character string.
FIG. 23 shows a relationship between an original document and a result of character recognition in the original document. A result of character recognition is also referred to as a recognition result. In general, character recognition is adversely affected by the faintness, angle, style, size, and the like of characters printed on paper.
FIG. 23 shows an example in which a character “” in the original document is incorrectly recognized as another character “”. Further, a character “” in the original document is incorrectly recognized as another character “”.
Hereinafter, a process of searching the recognition result (FIG. 23) for a character string “” will be described based on the technique described in the above Japanese Laid-open Publication No. 7-152774.
This retrieval process uses a table (Table 1) indicating misrecognized characters. The table indicating misrecognized characters is a table which lists certain characters which tend to be incorrectly recognized by character recognition. Table 1 shows that a character “” tends to be incorrectly recognized as “”, “”, “”, or “” and that a character “□” tends to be incorrectly recognized as “□ (symbolic quadrangle)”, “”, “”, or “”.
TABLE 1Subject CharacterMisrecognized characters    
When searching the recognition result of FIG. 23 for a character string “”, character strings “”, “”, “”, and “” are produced based on the character string “” using the table (Table 1) indicating misrecognized characters. In addition to the designated character string “”, the character strings “”, “”, “”, and “” are searched for. Therefore, “” for which “” has been incorrectly recognized can be retrieved.
However, in the retrieval process described in the above Japanese Laid-open Publication No. 7-152774, a list of characters which tend to be incorrectly recognized is prepared in advance. Therefore, when searching data having few errors, an excessive amount of searching may be executed using excessive character candidates. Conversely, when searching document data having many errors, misrecognized characters other than those on the list may not be retrieved.
For example, in the example shown in FIG. 23, when the recognition result is searched for a character string “”, the character strings “” (symbolic quadrangle)”, “”, “”, and “” are produced using a table (Table 1) indicating misrecognized characters. Each of the character strings are searched for. However, when an error (e.g., “” is incorrectly recognized as “” which is not listed in the table (Table 1) indicating misrecognized characters) occurs, it is not possible to retrieve “”.
Further, when searching document data, which has been obtained by recognizing characters in a general document having a certain layout, for a character string, the layout might be incorrectly recognized (e.g., vertical writing is incorrectly recognized as horizontal writing or vice versa; a subsequent line to be concatenated after line feed is incorrectly recognized; the concatenation between such paragraph is incorrectly recognized; and the like). The recognition error of layouts cannot be addressed by the retrieval method described in the above Japanese Laid-open Publication No. 7-152774.
For example, a case where an original document having a layout shown in FIG. 24 is subjected to character recognition will be now discussed. In FIG. 24, the proper order of the paragraphs is an upper right paragraph, an upper left paragraph, a lower right paragraph, and a lower left paragraph. However, in the process of character recognition, the order of the paragraphs may be incorrectly recognized, so that the lower right paragraph is incorrectly concatenated with the upper right paragraph, for example.
In this case, when the recognition result is searched for a character string “”, it is possible to search for individual characters using a table indicating misrecognized characters, or the like. However, when the concatenation of paragraphs is incorrect, the recognition results in “ . . .  . . . ” as shown in FIG. 25, for example. Therefore, the character string “” cannot be retrieved.
The present invention is provided to resolve the above-described problems. The objectives of the present invention are:
(1) to provide a retrieval method in which a search can be performed while dynamically changing a tolerance level to recognition error depending on a recognition result, and a retrieval device and a recording medium; and
(2) to provide a retrieval method in which a character string can be correctly retrieved from a recognition result even when the layout of a document is incorrectly recognized, and a retrieval device and a recording medium.