Along with the improvement in network technique and performance of information processing apparatuses, so-called full-text index searching has been enabled, which conducts a search by using words in a document as a whole as an index. As for a document search, a system allowing all users to search all documents in an open environment also is available. Further, a so-called security search system exists, which uses access authorization of a user who accesses a document search system so as to limit documents accessible by the user in accordance with the access authorization, thus securing security of the documents stored in a database or the like.
In a document search system having a function of inheriting document security (hereinafter called secure search), in many cases, access right information for a document is kept in a full-text index of documents to be searched, which is used as an access index to execute a judgment for access authority during a document search. In a conventional search system, in many cases, the secure search is enabled by extracting only a viewable document with a user name or a group name acquired during a search by a user or with a hierarchical/inclusion relationship (hereinafter collectively referred to as user information) thereof that matches with access right.
Such a document search system creates a search expression including user information during execution of a search, thus enabling the secure search without changing a base of a full-text search greatly. The search expression including user information, however, is an OR search expression in which all group names to which a user belongs as well as access authority of the user who accesses are enumerated. As a result, the conventional secure search has the following problem: an increase in documents and group hierarchies in number leads to an increase of the documents included in a group and groups in number, thus increasing search targets nonlinearly and affecting search performance greatly.
As a known technique to cope with the above-stated problem, there exists a technique of caching a search result to speed up a search at the second time and later. The processing of caching a search result can speed up the search processing using the same search expression. However, as far as the secure search is concerned, there is a specific problem that access authority among users or groups has to be kept. That is, since the cache result has to be registered based on the access authority, a cache index will be created including access authority specific to a user so as to identify the user. As a result, cache items generate a cache hit only when “the same user” executes the process using “the same search expression”, and generate a cache miss in the other cases, thus resulting in a failure to improve a cache use efficiency.
Such a problem results from a limited generation of a cache hit in the conventional document search system that is generated only when the same user inputs the same search expression a plurality of times. Such a search is not performed so often except for the case where there is any trouble of a user or in the document search system, and therefore a cache hit rate will be degraded greatly as compared with the case of a non-secure search.
A number of document search systems enabling the secure search have been known so far. For instance, Japanese Patent Application Publication No. 2005-284608 (Patent Document 1) discloses a technique for performing a secure search at a high speed. In Patent Document 1, an attribute value of access authority is set in a database beforehand, thus enabling the secure search.
The technique of Patent Document 1 requires, prior to the search processing, attaching a label to a combination of access right information in an index, and the search requires performing an OR search of user's access authority exhaustively and generating a union of the search results, and therefore an increase of groups and an increase of the number of documents accumulated for each group lead to a nonlinear increase of overhead of the search processing, thus degrading the search efficiency nonlinearly. As a result, it is not practical to directly apply such a technique to a system of targeting a full-text index search for secure search, in terms of the search efficiency.
Japanese Patent Application Publication No. 2004-164555 (Patent Document 2) discloses a search apparatus and a search method as well as an index construction apparatus therefor. Patent Document 2 relates to the invention of realizing a secure search, which specifies a security domain and registers a plurality of indexes to set access right to the security domain, thus securing security for each domain. Here, an administrator assigned for each domain causes a collection program for collecting documents in a security domain to run so as to collect accessible documents in the security domain and to create and edit indexes, thus generating an index for each security domain.
The processing of Patent Document 2 also enables a secure search but increases a burden of an administrator for index management. Further, the collection of documents using a collection program results in the execution of a document search processing for searching and extracting documents, and therefore a document search program has to be implemented in a context of a collection program for an administrator and a document search program for a general user, which generates wasted software modules.
Furthermore, the processing of Patent Document 2 assigns a hierarchical structure to documents and reduces the number of documents to be searched by performing pruning to check access right for each index, thus improving a search efficiency or response of the secure search. The hierarchical structure assigned to documents and the pruning processing can reduce a space to be searched. However, even when the hierarchical structure is assigned to documents, an OR search has to be executed so as to include access right of a user exhaustively if access is made to a document located at the bottom layer of a branch. As a result, a search speed or response will vary depending on a search expression to be used, thus degrading scalability of the search processing.
Moreover, there is a case where a unique hierarchical structure cannot be assigned to documents in accordance with a security level thereof, and further when a security level has to be changed, there is inconvenience to reconstruct a tree structure. Further, it can be assumed that a document, which can be searched originally, will be removed from a search target due to the hierarchical structure assigned to documents, and therefore this technique cannot be always applied to a general-purpose document search. Additionally, when a user makes an access using private access authority, such an access is realized without depending on the hierarchical structure, and therefore a plurality of index structures have to be prepared. Therefore, like Patent Document 1, this technique also is not very practical for a secure search targeted to a full-text search.