1. Field of the Invention
The present invention relates to an information processing system connected to a file server and more particularly, relates to a technique for index partitioning and management for a search system which deals with large amounts of data.
2. Background Art
With the advent of an era of an explosive increase in information, the amount of data handled by organizations and companies are exponentially increasing. It is said that the significantly increased data are mostly non-structured data such as files. For the improvement of operating efficiency through management and reuse of information, needs for file search technologies in organizations and companies are greatly expanding. In addition to such a background, development and diffusion of bulky data processing technologies and file search technologies of recent years have been promoting use of an enterprise search scheme in companies.
Today, some search systems, which deal with large-scale data, use a scheme of partitioning and placing a target index in a plurality of search nodes and distributing search processing over a plurality of the search nodes in order to maintain search performance at a certain constant level or more. In the search systems which employ this scheme, the index of a document group which is a search object are first divided into a plurality of partitions, and the index after partitioning is registered in corresponding search nodes. In retrieval operation, a query is transmitted to all the search nodes, and search processing with use of each partitioned index is individually performed in each search node. Search results obtained in respective search nodes are totaled at the end, and the totaled result ends up to be a search result of the target index.
When the number of documents which are search targets is large, the index is divided into a plurality of partitions and registered. Consequently, it becomes possible to curb the number of registered documents mapped to one partitioned index. If the created partitioned indexes are distributed and placed in a plurality of search nodes, scale-out of the search performance in large-scale data can be achieved.
Various methods are conventionally employed as a method for registering documents as search targets in one of the partitioned indexes. In one method, a hash value corresponding to an identifier, a file path and the like which identify each document is calculated, and a partitioned index as a registration destination is determined based on the calculated hash value. For example, there is a method in which an MD5 hash value (value calculated by MD5) relating to the pathname of a document (file) which is a registering object is calculated, and the document (file) is allocated to a partitioned index having an ID corresponding to a residue value obtained by dividing the calculated MD5 hash value by the total number of partitioned indexes. In another method, a file path is set as a unique ID, and a partitioned index as a registration destination is more simply determined for every specified folder.