This invention relates to a data acquisition system for acquiring the data dispersed over a network and varying with time and a storage medium for storing the data used in the system.
There is a system called a network robot, which acquires data items dispersed over a network by referring to links in the data items in a chain reaction.
Regarding the network robot, some of the technical words used in the present specification will be explained.
"Link" is a word for specifying a specific data item on a network. "Hyperlink" is sometimes used for the same meaning.
"Link group" means a set of one or more links.
"Initial link group" means a link group used as the initial value when a network robot starts to operate.
"Pre-acquisition mode" is one of the attributes attached to a link (in a recording storage section that records and stores links) and signifies that the acquisition of the data item specified by the link has not been completed.
"Post-acquisition mode" is one of the attributes attached to a link (in a recording storage section that records and stores links) and signifies that the acquisition of the data item specified by the link has been completed.
Next, explanation of a conventional network robot will be given.
With a conventional network robot, an initial link group is entered from a link input unit provided independently of the network robot. The initial link group is stored in a specific storage section.
The network robot acquires the data specified by the link in the pre-acquisition mode by performing network communication with only links in the preacquisition mode in the stored link groups. The mode of the link for which the data has been acquired is changed to the post-acquisition mode.
Furthermore, with the network robot, link groups are extracted from the acquired data items. Each of the extracted link groups is stored in the storage section, when it does not overlap with the links already stored in the storage section.
From this point on, such processes are repeated until the links stored in the pre-acquisition mode have disappeared. Then, the network robot terminates the processing.
Such a network robot has been disclosed in "The Web Navigator, Paul Gilster, Wiley Computer Publishing" and "UNIX Web Server Book Second Edition, R. Douglas Matthews et al, Ventana".
Such a conventional network robot, however, has various problems as shown below, for example:
(1) Even when the contents of the data have been updated, they cannot be acquired immediately.
(2) In contrast to item (1), although the contents of the data have not been updated, they may be acquired, resulting in the execution of useless data acquisition. This lowers the processing efficiency.
(3) When the data is acquired, it is impossible to carry out a flexible process according to the dynamic load on the network and the frequency of update of the data, putting a limit on the efficiency and speed of data acquisition.
For example, it is assumed that data group D1 exists in server group A whose data acquisition speed gets slower from 12:00 to 14:00 (hereinafter, referred to as time zone SA) because the network load is heavier during the period. In this case, when many items in data group D1 are always acquired during time zone SA, the time required to get data items is longer, although the data acquisition in data group D1 is fast outside time zone SA, forcing the data acquisition process constantly under a high load.