A database is a structure for storing and relating data within e.g. a computer system. Different database architectures exist depending on the intended usage. The primary use for general purpose databases is to manage and facilitate data entry and retrieval in relation to the relevant application. A recent trend has been the emergence of specialized database architectures optimized to work with specific application domains.
Complex event processing (CEP) is a technology for low-latency filtering, correlating, aggregating and/or computing on real-world event data, e.g. financial data. Such data is usually generated at high frequencies and so needs to be saved in an appropriate database to allow it to be evaluated, whether in real time, or at a later stage. Several specialized database products have emerged which attempt to store such data, which is generated in quantities that normally overwhelm general purpose databases.
The following products are available for use in CEP applications, and provide different functionalities for manipulating CEP data.
ProductDescriptionTechnologyVhayuHigh performance proprietaryProprietary, non-velocitydatabase optimized to workrelational in-memorywith high-frequencydatabasefinancial market dataKX systemsHigh performance database toOptimized, column-KDB+monitor real-time events andbased databasedetect and report faults fordata-intensive applicationsStreamBaseEvent processing platformIntegrated develop-which allows for developmentment environmentof applications that query andalong withanalyze high-volume real-specializedtime data streamscompiler
These products aim to provide improvement of both underlying database technologies and processing capabilities. However, data storage and querying or retrieval of the data is still carried out according to conventional processes. While these databases are well-suited to performing traditional transaction-oriented operations, they do not provide an efficient means for allowing large amounts of contiguous data to be accessed and/or evaluated. The process of evaluating large contiguous datasets is central to responding to statistical descriptive data requests.
For example, when determining the minimum or maximum value in a string of values stored in a database, typically all the records in that data string have to be retrieved and evaluated to determine the location and/or magnitude of the minimum/maximum.
Thus, the operation is costly in terms of the I/O bus usage and/or network bandwidth utilisation in retrieving the dataset, and in terms of the computation required to evaluate the dataset. These costs will increase as the number of values in the requested data string increases.
In particular, comparative experimentation may be extremely costly, due to the cost of retrieving and evaluating a number of individual data sequences.