1. Field of the Invention
The present invention relates to a technique for efficiently analyzing layers of a large amount of file data saved in a file server, and in particular, to a file list generation method and system, a file list generation apparatus, and a program which allow efficient generation of a list of file data for which search indices in a search system are to be updated.
2. Background Art
In recent years, improved speed performance of computers and increased capacities of HDDs have allowed an enormous number of unstructured documents to be produced. This has led to a growing need for a search system which can quickly and adequately find a required document from among an enormous number of documents. To obtain adequate search results, it is important to reflect operations of adding, changing, or deleting file data in search indices a timely manner; the operations are performed, after creation of search indices, on a file server in which an unstructured document to be searched for is stored. When the operations are reflected in the search indices, the search indices only of added, changed, or deleted file data are generally updated because also updating the search indices of unchanged file data requires much time. To achieve this, a list of the added, changed, or deleted file data needs to be created.
To address the need for such a search system, some file servers include an interface which stores a history of operations performed on file data and which provides a list of added, changed, or deleted file data in response to an external request (see JP Patent Publication (Kokai) No. 2006-268456).
When a list of added, changed, or deleted file data is created, if the file server provides such lists, the corresponding interface may be utilized. However, if the file server includes no such interface, all the file data that is present in the file server and for which search indices are to be created needs to be scanned to determine whether or not the data has been updated. In this case, even if only a small amount of file data has been added, changed, or deleted, all the file data needs to be scanned. Thus, disadvantageously, the process for creating a list of added, changed, or deleted file data increases time required for the process of updating indices.
In this regard, the speed of the scan process may be increased by dividing a file tree structure in the file server into a number of portions and carrying out the scan process on these portions in parallel.
However, the file tree in the file server varies depending on the environment of the file server and is unknown. Thus, determining an efficient division method is difficult.
An object of the present invention is to carry out a search index update process at a high speed to make search results from a search system as adequate as possible. To accomplish this object, the present invention aims to provide a file list generation method and system, a file list generation apparatus, and a program which allow a file data scan process to be carried out, in parallel and in a distributed manner, on a file server having created indices once, to create a list of added, changed, or deleted file data at a high speed.