There are many circumstances in which it is required to monitor and analyse the operation of systems, so that any problems can be identified and solved, and predictions regarding future operation of such systems can be made. For example, it may be required to monitor the operation of a computer network consisting of tens, hundreds or even thousands of computer stations. One way to monitor such a network would be to track all data packets being transported around the system and between stations.
It will be appreciated that, as a result of such a tracking operation, an enormous amount of data is generated and, although the collected data is required for the monitoring and analysis process, it is impractical (and in many cases virtually impossible) to store such a large amount of data in an uncompressed state for subsequent retrieval and analysis because of the unacceptably large storage capacity which would be required. Obviously, therefore, it is desirable to substantially compress the data for storage and analysis, preferably without the loss of any significant information which may be essential to the accuracy of the results of the analysis process.
In known systems, the collected data is compressed and stored by assuming that it follows a particular classic probability distribution, and storing only a very few parameters determined from the collected data. For example, if only the mean and standard deviation of the collected data is calculated and stored, a probability distribution as illustrated in FIG. 1 of the drawings is obtained, which is characterised by the small number of parameters obtained from the data, the rest of the data being discarded.
However, this type of compression has a number of disadvantages associated with it. In particular, the data may not actually follow the assumed classic probability distribution (the true probability distribution of the data described by the probability distribution illustrated in FIG. 1 being shown by the broken line 100) so that the stored parameters have limited value in the analysis of the original data. Further, such data stored in the form of a summary chart gives limited reconstruction ability, i.e. it is not usually possible to reconstruct any of the original data with any accuracy, and other parameters which were not originally calculated/measured and stored cannot then be obtained if required.
Of course, it can be assumed that the parameters required from the data change periodically, for example, hourly, daily, etc. but still only a small amount of data is being stored and this still does not guarantee that the right parameters are being stored, with the result that analysis of the original data is often not as accurate and meaningful as would be required in many circumstances. In particular, problems within the system are often difficult to determine using known methods because they do not provide a precise description of the arbitrary probability distributions typical of real systems, and the resultant delay or even failure in determination of any such problems can have a devastating effect on the whole system.
In any event, such problems can only be determined by human observation, structural changes in system behaviour determined from the pattern formed by measured data, and stored parameters obtained from the original data.
We have now devised an arrangement which seeks to overcome the problems outlined above.