The present invention is related generally to data storage techniques, and, more particularly, to collecting and retrieving data produced by a number of different data sources.
Industry increasingly depends upon data acquisition and control systems to improve the efficiency of running industrial processes while lowering their costs. Data acquisition begins when a number of sensors measure aspects of an industrial process and periodically report their measurements back to a data collection and control system. The word xe2x80x9cmeasurementxe2x80x9d should be construed very broadly: the xe2x80x9cmeasurementxe2x80x9d produced by a sensor may be, for example, an inventory of packages waiting in a shipping line or a photograph of a room in a factory. Sophisticated software examines the incoming data, produces status reports, and, in many cases, responds by sending commands to actuators that change how the industrial process is running. The data produced by the sensors also allow an operator to tailor the process in response to varying external conditions, to catch incipient equipment failure, and to move equipment into and out of service as required. A simple and familiar example of a data acquisition and control system is the thermostat: a thermometer measures the current air temperature, the measurement is compared with a desired temperature range, and, if necessary, commands are sent to a furnace or air conditioner to move the actual air temperature into the desired range.
Of course, many industrial processes are much more complex than this simple example. Increasing process complexity is controlled by increasing the sophistication of the control software and by increasing the number of data sensors and actuators. It is not unheard of to have tens of thousands of sensors monitoring all aspects of a multi-stage process. These sensors are of varied type to report on varied characteristics of the process. Their outputs are similarly varied in the meaning of their measurements, in the amount of data sent for each measurement, and in the frequency of their measurements. As regards the latter, for accuracy and to enable quick response some of these sensors take one or more measurements every second. When multiplied by tens of thousands of sensors, this results in so much data flowing into the control system that sophisticated data management techniques are required. One currently popular technique is xe2x80x9cdata streaming.xe2x80x9d Here, incoming data are immediately stored, in order by arrival time, in one or more data files. Storing data in time-sequential order allows the control system to quickly access data relevant to the state of the process at a chosen time and to make an analysis accordingly.
However, current data streaming techniques achieve their efficiencies by trading off some flexibility. A first problem caused by this stems from the interrelationship of a complicated industrial process with other processes and with the processing environment. This interrelationship may itself be very complicated and may be constantly changing. To accommodate change, operators would like to frequently add, move, or remove sensors and to integrate the sensors"" outputs into the control system. Because current data streaming techniques are optimized for efficiently managing large, ongoing data streams, they are often unable to readily accommodate configuration changes. Indeed, some data acquisition and control systems must be shut down entirely to reconfigure them for new or different sensors and actuators. As the industrial process depends upon its data acquisition and control system and cannot run reliably without it, shutting down the system involves a very expensive shutdown of the entire industrial process. Thus, the limited flexibility of data streaming often inhibits operators from making quick reconfigurations and from readily taking advantage of advances in data acquisition technology.
A related problem with data streaming stems from the varied types of data acquired by an extensive system. Current data streaming techniques do not comfortably handle so-called xe2x80x9cnon-real-time data.xe2x80x9d The type of sensors discussed above take their measurements in xe2x80x9creal time.xe2x80x9d For example, at 12:34:56 p.m., a level sensor records the water level in a holding tank. The measurement produced, the water level, is relevant to the exact time when the measurement was taken. Contrast that with the following example of non-real-time data. At 3:00:00 p.m., a technician dips a collecting cup into a vat and draws a sample of the vat""s contents. The contents are to be subjected to a laboratory analysis that is either too sophisticated for a real-time sensor to perform or that is performed so infrequently that the cost of an automated real-time sensor is not justified. In any case, the technician takes the sample to the laboratory and performs the analysis. The results of the analysis are not available until 4:30:00 p.m. at which time the technician would like to enter the results into the data acquisition system. Those results are not relevant to the state of the vat at 4:30:00 p.m., but rather to the time when the sample was drawn at 3:00:00 p.m. Traditional data streaming cannot readily back up and store the analysis results with real-time data produced around the results"" xe2x80x9ctime of relevance,xe2x80x9d that is, at 3:00:00 p.m. Instead, data streaming stores the analysis results when they become available, storing them along with real-time data produced at 4:30:00 p.m. When, later in the day, the control system attempts to analyze what was the state of the process at 3:00:00 p.m., it may miss the results of the laboratory analysis because those results are not stored in time sequence with the real-time data points. Analysis becomes much more difficult, and the value of non-real-time data points is consequently greatly reduced.
What is needed is a way to store real-time and non-real-time data that allows new data sources to be added or removed and that allows the real-time and non-real-time data points to be retrieved for analysis in a time-coordinated fashion.
In view of the foregoing, the present invention presents a data collection and retrieval system that puts data produced by real-time and non-real-time data sources into parallel xe2x80x9cstreamsxe2x80x9d or data files. The benefits of data streaming are retained by storing real-time data points with time stamps in one or more data files and non-real-time data points with time stamps in other data files. These files form parallel streams of data. The parallel streams are associated with one another and with a particular monitoring period. To access data relevant to a particular time period, data from the parallel streams associated with that time period are retrieved in a coordinated fashion based on their time stamps.
Real-time data points are stamped with the times at which their data were collected. To make data storage and retrieval efficient, these data points are stored in a time-sequential order as soon as they reach the data collection system. Non-real-time data points are time-stamped with their times of relevance rather than with the potentially much later time at which the data points reach the data collection system. The non-real-time data points are stored in time-sequential order in their own data files, the sequence time being the time of relevance rather than the time of collection. Because there are typically far fewer non-real-time than real-time data points, the efficiencies and limitations of data streaming need not be applied to the non-real-time data files. It is thus feasible to back up within a non-real-time data stream to insert a data point at its proper time of relevance.
Upon retrieval, an operator uses header files to access data for a given monitoring period. By referencing the streams that contain data points relevant to the monitoring period, these header files facilitate coordinated retrieval of data from multiple parallel streams. The time stamps in the parallel stream files allow the data points to be merged together into one coordinated, time-sequential data stream for analysis. The header files can even refer to an external database so that its data can be treated as a parallel stream and merged with the process monitoring data.
The parallel stream data collection and retrieval technique allows operators to add or delete data sources without shutting down the process being monitored. When a new source is added, its data, whether real-time or non-, are added to the data streams flowing into the various data files without disrupting existing streams. Operators can also incorporate additional parallel streams containing multiple versions of process monitoring data.
The parallel stream framework allows for great flexibility in the contents of the streams. For example, testing may show that a sensor has been producing measurements with a consistent deviation from the correct values. Rather than going back and correcting all of the data points produced by the faulty sensor, a new parallel stream of correction factors is produced. Data analysis programs combine the faulty sensor readings from the original stream with the correction factors from the new stream to produce the corrected results.