Field of the Invention
The present invention relates to an image processing device, an image processing method and a non-transitory computer readable recording medium. The present invention more specifically relates to a technique for easily searching a text in image data.
Description of the Background Art
In conventional offices, massive amounts of documents are managed in forms of image data generated through digitization of the documents. This type of image data is stored and managed in a predetermined data format such as PDF, for instance. Some image data desired by a user is searched and used among from multiple image data stored on a database. In such a case, a collective search is very often done by entering a search keyword.
Some of the multiple image data stored on the database have similar contents but described in different words, or the same word having more than one spelling or different written forms with the same meaning, which are called inconsistent spelled words. Even when the user performs an OR search, a searching for image data that match any of multiple search keywords, it is difficult to sort out all the inconsistent spelled words contained in the image data, resulting in failure to find all the relevant items.
There are various types of techniques for preventing failure to search all the relevant items as described above. Some of these known techniques are introduced for example in Japanese Patent Application Laid-Open No. JP 10 (1998)-307839 A (hereafter, document 1) and Japanese Patent Application Laid-Open No. JP 2004-86307 A (hereafter, document 2).
According to the known technique disclosed in document 1, a searching device allows a user to select a string appropriate for searching among from strings extracted by a fuzzy search, and easily realizes the suitable fuzzy search. This searching device extracts a string that exactly corresponds to a search string entered by the user and a string that is similar to the search string entered by the user from targeted documents, and allows the user to select a string should be used for the search among from the extracted strings. As the user selects the string that should be used for the searching, the search is done based on the selected string and a search result is obtained.
According to the known technique disclosed in document 2, a searching device that solves the problems of failure to search all the relevant items for a search by sentences in a natural language. For registering a document with a database, etc., this searching device is capable of providing a unique document ID to the document to be registered, and of extracting a word contained in the document to be registered to extract a word in a different expression, written form or spelling, which is a standard description of the extracted word. A document ID using the extracted word as an index is registered with data for search. The data for search is managed on the searching device separately from the registered documents. The data for search corresponds to each word registered as the index, and is generated as data with which the document ID of multiple documents containing the word is registered. When searching for the document, the word, which is the standard description, is specified based on the keyword, the search criteria, and the search is done for the specified word within the data for search, thereby obtaining the document ID corresponds to the word and extracting the document corresponds to the keyword.
According to the technique disclosed in the aforementioned document 1, the user needs to again select a string should be used for the search among from strings containing a string that is similar to a search string entered by the user after entering the search string, resulting in poor operability. The user may select wrong one when again selecting the string should be used for the search among from the strings containing the string that is similar to the search string. In such a case, the suitable fuzzy search is not performed, and there still may be a failure to search all the relevant items.
According to the technique disclosed in the aforementioned document 2, in order to extract the document containing the keyword entered by the user or the word that is similar to the keyword from the database with which multiple documents are registered, the data for search that contains each word, the standard description, and the document ID correspond to the data for search need to be registered in advance. Without the registered data for search, the document containing the keyword entered by the user or the word that is similar to the keyword may not be extracted. It is assumed, for example, the user sends a message that contains a document file registered with the database to another user by email. In this case, even if another user who received the message performs the keyword search into the document file contained in the message under a different environment, a word that is similar to the keyword may not be searched. If another user performs a search by entering a word that is not contained in the document file as the keyword, the word is not searched even though the document file contains the word that is similar to the entered keyword, resulting in failure to find the relevant item.