A file aggregation server which is the target of the present invention is a component of a content analysis system for analyzing contents in files in the file server. A server referred to as an analysis server which constitutes the core of the content analysis system acquires files in the file server and performs analysis processing for the contents inside the files. After the analysis processing, the analysis server provides a service enabling the user of the content analysis system to refer to the result of the analysis processing.
One form of the analysis server is a full-text search server which provides a service being able to search text contents inside files in the file server. After acquiring the files in the file server, the full-text search server extracts text contents inside the files and performs processing of creating an index for full-text search from the extracted text contents and the file location information.
After creating the index for full-text search, the full-text search server enables the search service which receives character string data to be used for the search from the user of the content analysis system and provides the search result stating the location information of the files including the received character string data. Furthermore, as other forms of the analysis server, a decision-making system, an image search server, and others using the contents inside the files in the file server can be named
The file aggregation server in such a content analysis system is located between the file server and the analysis server and assumes a role of relaying the contents of the files and analysis target contents so that the analysis server can easily perform the analysis processing. The analysis server can also directly acquire the files in the file server when performing the analysis processing instead of utilizing the file aggregation server.
However, in such cases as where there are a significant amount of files as the target of the analysis processing and where the network online distance between the file server and the analysis server is extremely long, as it is more efficient in the perspective of processing throughput and others to comprise the files as the analysis processing target near the analysis server, normally, the file aggregation server is utilized.
Furthermore, for handling various needs for analysis, the content analysis system assumes various types of contents to be the target of analysis, which causes a plurality of analysis servers of different types to exist. In such a form of system, if individual analysis servers respectively acquire files from the file server, the resource of the file server and the resource of the network are wasted.
When utilizing a plurality of analysis servers, also for minimizing the resource usage amount of the entire system, a file aggregation server which aggregates files from the file server and provides the files and analysis target contents which the individual servers need is utilized. A conventional technology of the file aggregation server which is used so that the analysis server can easily perform the analysis processing is, for example, the Patent Literature 1. The Patent Literature 1 discloses the technology of annotation at the time of storing aggregated files so that the analysis server can easily identify the files including analysis target contents.
When a file aggregation server aggregates and stores a significant amount of files in the file server, the analysis server requires much time for identifying the files as the target of the analysis processing from the file aggregation server. Specifically speaking, if the analysis server assumes text contents to be the analysis target, for identifying the files including text contents, it is necessary to acquire all the files in the file aggregation server and check whether each of the files includes the text contents or not. Furthermore, another concrete example is that it is necessary to check the presence or absence of addition and update for all the files when the analysis server assumes only the files for which addition or update has been performed since the previous analysis processing to be the analysis target.
For solving such problems, when storing aggregated files, the file aggregation server in the Patent Literature 1 extracts such information as the internal contents from the files and stores the extracted information as annotation data. The annotation data of the files is additional information related to the aggregated files, such as the types of contents and last file update date and time information included in the files.
By referring to annotation data once instead of directly acquiring files in the file aggregation server, the analysis server identifies the files as the target of the analysis processing and indirectly acquires the identified files only. By using this technology, the time until the analysis server completes the acquisition of the files as the target of the analysis processing from the file aggregation server can be reduced. The reduction of this processing time enables up-to-date results of the analysis processing to be provided to the user of the content analysis system, which improves the convenience of the content analysis system.