For the long-term archiving of process values (also referred to as process measured values or process data), the data are stored in the form of series of measurements (also referred to as measured value histories or process value histories) in a so-called historian server.
The historian server is a special database with real-time functions for collecting process or measurement data, messages and reports (files) for the long-term storage and archiving of the collected data, the collected data and completed reports being stored on the hard disk of a protocol server at defined intervals of time.
In this case, the operation of archiving the data in the historian server is very memory-intensive since the automated system of a technical installation or a technical process provides a large volume of data, that is to say a very large amount of data has to be moved when reading the data even if only a single mean value, for example an annual mean value of a process variable, is required over a long period of time.
The historian servers can continuously query so-called aggregate values (also referred to as threshold or limit values of a process value) over very long periods of time. For this purpose, the entire raw data stock of the required signals must be read and the corresponding aggregates must be determined. Another disadvantage of the currently used methods is based on the overwriting of historical data. In this case, the previously created aggregate values would suddenly become unusable and would only be available again in a new aggregate calculation run to be initiated manually.
The methods cited below are used with the methods which are currently used in automation technology and are intended to store process values (also referred to as signals below) for archiving.
The method which uses a relational approach by means of a relational database operates according to the two variants cited below.
In variant 1, all signals are stored in an “event-driven” manner in a table with their signal index, time stamp, value and status, either the signal index and/or the time stamp being used as the primary index. In this case, the events which trigger storage are the respective changes in measured values around a certain signal-specific tolerance band. All signals are archived in a table in a manner which is not equidistant in terms of time.
In variant 2, all signals are stored in a table in a fixed time pattern, for example every 5 seconds. In this case, the columns form the signal values and the status information.
An exemplary disadvantage is found in both variants in practice: if a very large number of signals, for example more than 1000 signals, are stored in such a table, the operation of reading individual signal time series takes place very slowly since, on account of the SQL access of relational databases, all data or all index tables of all signals are first read in order to then look for the desired signal index and extract the latter. In addition, in variant 2, to a diminishing extent, all signals are recorded at the same speed, for example 5 seconds. Values which change slowly are consequently recorded several times and values which change within 5 seconds can be lost.
When using the relational approach, it is possible to already cyclically determine the required values in advance and to likewise store them on account of the technology of relational databases. For example, so-called OLAP (Online Analytical Processing) solutions usually run once per night and determine and store the desired aggregates. This operation can be very time-consuming and involve a high data throughput since all uncompressed data (also referred to as raw data) have to be read from the hard disk again. In addition, the OLAP run has to be planned and set up, which is associated with a considerable amount of configuration. Furthermore, compression scripts in the database ensure that aggregates, for example one-hour aggregates, are again formed from all signals, for example every hour, and are stored.
Although this would, in theory, shorten the query time for an annual mean value in a report, this procedure also has to be planned and configured.
However, on account of the general approach of relational databases, the optimization described above can reach its limits very quickly and, solely on account of the large number of stored signals, results in the relational database very quickly dealing only “with itself” since the histories have to be searched again and again in order to continuously generate the aggregates, that is to say the last day or the last hour has to be read, for example, and all values read have to be calculated.
Another disadvantage arises in this case when overwriting historical data. In the case of forecasting systems, for example, the cyclically determined new forecasts, that is to say historical times series in the future, are created and in the process the old forecasts are continuously overwritten. In this case, the previously created aggregate values would suddenly become worthless since other values are now produced and could be used again only in the next run. In addition, both relational variants would not be readily possible. The data model in the database would have to contain special extensions for future values.
Another method which uses high-performance proprietary, data-stream-oriented storage methods in a proprietary database format in a database is described below using the PGIM (Power Generation Information Manager) server used by ABB. In this case, a relational data model is not used during storage. Rather, each signal is treated like a separate database. This principle circumvents the procedure of the above-described relational approach in that a fixed pattern with regard to the data rate does not have to be predefined during writing since the signals are independent. When reading a time series of a signal, the sub-database of the signal sought is directly addressed and need not be filtered from an overall database in a complicated manner. This approach results in the operators of technical installations proceeding to record all data as far as possible and thus storing all of the data which arise in the form of uncompressed data or raw data.
As a result, the data stock involved and thus the hard disk capacity used to archive process data have increased greatly in recent years. The data stock to be stored is currently in the range of 0.5 to approximately 5 terabytes, but the trend is continuing to increase based on the data stock to be stored.
In principle, the above-described use of the relational approach by means of the relational database is several factors less efficient, with regard to the possible storage and reading rate, than the use of high-performance proprietary, data-stream-oriented storage methods in a propriety database format.
However, aggregate optimization is not achieved with the use of the high-performance propriety, data-stream-oriented storage method in a proprietary database format since said optimization is several factors faster and can thus provide a sufficiently fast response in the case of long queries and usually does not allow any separate recalculations since it is designed only for raw data storage.
In addition, the following problem can arise in the two above-described methods according to the relational approach and according to the propriety approach. The historian server continuously queries aggregate values over very long periods of time, for example in order to create a report. For this purpose, the entire raw data stock of the required signals is read and the corresponding aggregates are determined. If it is assumed, for example, that a signal provides a new value approximately every second (with a resolution of one second) and stores the time stamp, the value and the status in 18 bytes, a storage space of 356*24*3600*18=528 Mbytes is involved for one year. In order to thus obtain a single annual mean value, a data stock of 528 Mbytes are read and processed in the example described above.