There are numerous areas where multiple devices generate multiple data streams, and it is desirable to monitor, analyze and/or predict behavior that is time dependent. This time dependency, across multiple data streams can be difficult to resolve, particularly for the purposes of analysis, using existing data mining environments.
There is a need for an improved data mining environment that enables data analysis (including for data mining purposes) of time dependent data across, for example multiple streams, multiple entities, multiple possible attributes of the entities, multiple possible behaviors of the data stream over time and multiple events reflected in the data stream, resulting in a multi-dimensional environment (i.e. entities, streams, entity attributes, stream behaviors and stream events). There is a further need for such a data mining environment that is flexible enough to permit relatively open ended queries thereby enabling for example the detection of trends, including trends with new dimensions or based on relatively small data sets.
For example, intensive care units worldwide use a range of medical monitoring equipment, such as medical devices for life support and critical monitoring. These devices have been operational for over 50 years and enable critically ill medical and surgical patients to be observed and treated in a complex, specialized environment by physicians and nurses trained in restoring and/or maintaining the function of vital organs. A diverse range of devices display physiological data and many have the ability to output this data via serial, USB or other ports.
In addition to collecting this data for use in real-time by care providers, it is desirable to enable secondary analysis of the data for other related clinical research, for example, to enable the discovery of previously unknown trends and patterns that may be indicative of the onset of some condition. The potential for the secondary use of health data, such as this, is significant. In an American Medical Informatics Association White Paper published in the Journal of the American Medical Informatics Association in 2007, entitled “Toward a National Framework for the Secondary Use of Health Data”, the urgency for infrastructures to support the secondary use of data in today's data intensive healthcare environment is seen as pivotal to the US Health system.
Medical monitoring equipment produces large amounts of data, which makes analyzing this data manually impossible. Adding to the complexity of the large datasets is the nature of the physiological monitoring data—the data is multi-dimensional, where it is not only changes in individual dimensions that are significant, but sometimes simultaneous changes in several dimensions. As the time-series produced by the monitoring equipment is temporal, there is a need for clinical research frameworks that enable both the dimensionality and temporal behavior to be preserved during data mining, so as not to lose the information of time and context during the mining process.
In the field of clinical research, to enable the discovery of new trends and patterns that may be indicative of the onset of a condition in intensive care patients where the timing of certain events in a patient's condition can be of high importance, there is a need for integrated temporal abstraction data mining systems to include methods to enable realignment of historical data in relation to the onset of the condition being investigated.