The present invention relates to a control method in a search server and file server for providing functions of searching files stored in a hierarchy file storage system.
The application of computers has been spread widely in the various businesses and other fields in response to advancing high performance and low cost. In late years, the number of data tiles to be stored in a computer system has also been enormous. Particularly, for a purpose of managing a large number of files in low cost in an entirety system, the application of a hierarchy file storage system has been spread widely in such a way that files are moved suitably to an optimum file server, on the basis of a file management policy, by combining a high speed and high cost front-end file server with a low cost and large capacity back-end file server.
In the case of managing such large-capacity file, there has also raised a problem that a user cannot find out where a desired file is stored in. For this problem, a full-text search service has been used in these days. In the full-text search service, the search server analyzes the file data stored in the computer system to create a search index in advance. The user then transmits a search query to the search server, for searching the file to be acquired, to access a targeted file on the basis of a searched result. It is considered that the number of files to be stored in the computer system could be increased from now in such search service, and a difficulty could also be raised such that the user cannot identify all of file data for what sort of it and where it is stored in Therefore, the search service becomes significant for the user, and the use of this service could be spread widely.
In the past, in the case of realizing a file search service cooperating with the file server and search server, it has been required that the search server crawls the file server periodically to specify an updated target file when carrying out a search index update in the search server. In this crawling, it has required that an access is executed to all of the files stored in a target file server to refer a final update date and time of the file, update a previous index, and determine whether the file is updated. In such operation, the greater the number of target files, the larger the processing toad becomes. Particularly, another problem arises in the case of using the hierarchy file storage system as a file server. Specifically, due to the access processing to the all of files to specify the updated target file, the file stored in a back-end side file server in the hierarchy file storage system is read out once in a front-end file server to then access to the file by the search server. For this reason, a data traffic between the file servers becomes increased in the hierarchy file storage system.
A method of solving a problem in the search index update in the search server has been proposed as US-A-2008/0071805 disclosing that a file update list is created in the file server in advance and the search server then acquires and uses the list in the search index update. In this method, the file server becoming a search target detects a content of the file update in the file system to accumulate its information and manage the files. The search server acquires the file update list, therefore, it is not required to access to all of the files in the file servers when updating the search index, for specifying the updated target file. In this way, it is possible to execute the index update efficiently by using the file update list.
In the technique disclosed in US-A-2008/0071805, the data traffic between the networks is sometimes increased in the case where the search server acquires the file update list and updated target file from the front-end side file server. Particularly, a separate data transfer from the back-end side file server to the front-end side file server is required when an entity of the updated target file is stored in the back-end side file serve of the hierarchy file storage system.
Also in the technique disclosed in US-A-2008/0071805, a file path name cannot sometimes be specified in the front-end side file server of the updated target file in the case where the search server acquires the file update list and updated target file from the back-end side file server. Particularly, in the hierarchy file storage system, the path name of the front-end side file server is not required to store synchronously in the back-end file server when the file is replicated or moved from the front-end side file server, in the case where a name space to be used in the front-end side file server and back-end side file server is independent. In the case of storing the path name synchronously, a rename processing is required synchronously even in the back-end side file server in accordance with when the rename of a directory name etc. is carried out in the front-end side file server. For this reason, it is fully assumed that the path name is not stored synchronously in the case of considering performance requirements of the entirety system.
However, in the above-described case, there arises two following problems when the search server cannot acquire the path name in the front-end side file server of the updated target file in the case of carrying out the index update. First, the path name of front-end side file server cannot be contained in a searched result supplied by the search server. The user of the hierarchy file storage system therefore uses the path name of front-end side file server to carry out a file access. For this reason, the user received the searched result cannot identify an access destination of the file even though the user tries to access the file when the path name and searched result cannot be supplied. Second, it becomes difficult to acquire access control information set in a parent directory and an ancestor directory of the updated target file in carrying out the index update, when the front-end side file server uses NFS (Network File System) as a file-sharing protocol. This is because the access control information of the directory is also required for carrying out a qualification of the searched result on the basis of an access right set in a search request user and a search target file when the search server supplies the searched result. The search server requires path name information regarding the updated target file, for accessing to the directory.