In recent years, along with popularization of computers and the Internet, a huge number of unstructured documents have been made available and the necessity is increasing for a search system capable of accurately searching for required documents at high speed. Moreover, in order to provide an advanced search customization function, a document collection system (sometimes referred to as a crawler) or a text analysis system included in the search system is required to be able to change the language attribute, the collection field, mapping, search characteristics or the like in a flexible manner. If a change occurs in the system configuration, however, all documents need to be re-collected in order to reflect the change in the index information of the search system.
To make document re-collection in the search system more efficient, for example, Japanese Patent Application Publication No. 2001-184355 (hereinafter referred to as “Patent Document 1”) discloses an information collection system, which sends content attribute information indicating the attributes of contents from a content server to an information collection device. The information collection device identifies content that has been updated or added in the content server, based on the received content attribute information, and sends a request to send the identified content. The content server sends the content related to the request to the information collection device.
In addition, Japanese Patent Application Publication No. 2005-327297 (hereinafter referred to as “Patent Document 2”) discloses a knowledge information collection system for efficiently collecting document information to be registered in a knowledge database from a network. A web collection module uses a mode for collecting only document files updated after the previous collection time as a re-collection mode for performing re-collection processing of a group of document files, based on a specified origin address information. In this collection mode, the knowledge information collection system re-collects only document files updated after the previous collection time among the document files collected a specified number of days before the current time.
In normal re-collection processing in a search system, only the documents updated or added after the previous collection time need to be collected as described in Patent Documents 1 and 2 above. In the case of a change in the system configuration of the search system, however, all documents need to be re-collected in order to maintain the consistency of the index information within the search system. In addition, the re-collection needs to be safely completed. Thus, in the event that the forced re-collection is interrupted for some reason, the index is still inconsistent and therefore it has typically necessary to start the re-collection again. In this case, the documents collected before the interruption are collected redundantly due to the restart after the interruption, which causes an inefficient collection work. Also from the viewpoint of the collected side, the repetition of collecting the same documents leads to an unfavorable increase in load.