For the long-term archiving of process values (also referred to as process measured values or process data), data can be stored as a series of measurements (also referred to as measured value histories or process value histories) in a so-called history server. Archiving data can be very memory-intensive in this case. A mass memory with a very large memory area is consequently used to read the stored data, and a large number of data items are moved when processing the read data further. In this case, the data stock to be stored is composed of the product of the number of signals, their expected rate of change and their recording duration.
The process data stock which is used by the operator of the technical installation and is intended to be stored for archiving has increased greatly in recent years and there is a desire, on one hand, to store the data in a loss-free manner and, on another hand, to read the data again at a speed which is as high as possible. These desires can be fulfilled with simple storage in an uncompressed raw format. However, with this principle of raw value storage, the available mass memory is likely not used in an optimum manner. The raw data stock to be stored can, for example, be in the range of 0.5 to approximately 5 terabytes or higher, and the trend is continuing to increase based on the data stock to be stored.
Although mass memories are available for storing the volumes of data arising from the process or the installation, compression methods can be used to store large volumes of data in process automation. A compression method can be selected with a goal of achieving loss-free storage of the data and the reading of these data again at a speed which is as high as possible. A less optimum compression factor can be accepted in this case.
Loss-free or lossy methods are currently used to compress the volumes of data arising from the process or the installation.
In the known lossy compression methods, which include boxcar/backslope methods and transformation methods, the previously compressed data are available again, following decompression, with more or less severe differences with regard to the measured value and the time stamp.
The boxcar/backslope, swinging door or wavelet transformation methods are described, for example, in
“Automatic Tuning of Window Size in the Box Car Backslope Data Compression Algorithm” (available at http://med.ee.nd.edu/MED7/med99/papers/MED101.pdf)
“Swinging Door Compression” (available at http://training.osisoft.com/NR/rdonlyres/5547CC68-65AD-4E55-A365-B30C1FCF74F9/0/SwingingDoorCompression.doc), and
“Wavelet” (available at http://en.wikipedia.org/wiki/Wavelet).
These methods afford a good compression rate but are not loss-free. In addition, on account of the volume of data to be processed, the reading operation is considerably slower than the operation of reading the data stored in uncompressed form.
The methods which are known from loss-free compression, such as the LZW, LZ77 or LZ78 method (in this case, a sequence of characters in the original text is replaced with a sequence of characters from another alphabet), and are described, for example, in
“LZ77 and LZ78” (available at http://en.wikipedia.org/wiki/LZ77)
also provide unsatisfactory compression rates for the raw data, which are provided from the process or the installation and are intended to be stored, and can be too slow for many applications in process automation during unzipping.
However, in the so-called lossy compression methods, the compression of the data which arise is associated with a loss of information because the data which have been read or decompressed differ from the original data.
The lossy methods described above are not suitable, in particular, in the environment of process or installation information systems which often use the archived data as a billing-related basis—for example if measurements from the process environment which are as accurate as possible are used as a basis for balancing processes in the office environment, a lossy value can not be used in this case—or, on the basis of these data, subsequently analyze exactly the sampling times of the process data.
Another exemplary disadvantage of the lossy methods is based on the fact that the decompression of the data (also referred to as unzipping) can be very computation-intensive. Therefore, with the large volumes of data to be processed, compression is more likely to be a hindrance since the decompression of the data simply takes a long time.