Embodiments of the present invention relate to data processing and more particularly to techniques for extending indexing capabilities using a data cartridge.
Databases have traditionally been used in applications that require storage of data and querying capability on the stored data. Existing databases are thus best equipped to run queries over a finite stored data set. The traditional database model is however not well suited for a growing number of modern applications in which data is received as a stream of data events instead of being stored as a bounded data set. A data stream, also referred to as an event stream, is characterized by a real-time, potentially continuous, sequence of events. A data or event stream thus represents a potentially unbounded stream of data. Examples of sources of events may include various sensors and probes (e.g., RFID sensors, temperature sensors, etc.) configured to send a sequence of sensor readings, financial tickers sending out pricing information, network monitoring and traffic management applications sending network status updates, events from click stream analysis tools, global positioning systems (GPSs) sending GPS data, and others.
Oracle Corporation™ provides a system (referred to as Complex Event Processing (CEP) system) for processing such event streams. A CEP system is quite different from a relational database management system (RDBMS) in which data is stored in a database and then processed using one or more queries. In a CEP system, a query is run continuously and query processing performed in real-time as events in a stream are received by the system.
A CEP system can receive data events from various different sources for various different applications. Accordingly, the data that is received may not follow a fixed format or schema but may be more heterogeneous in nature (e.g., binary data, XML data without an associated schema). For example, the data that is received may include streams of image data for an image processing application, streams of audio data for an audio processing application, streams of spatial or geographic or location data for a GPS application, streams of stock data for a financial application, and the like. As a result of the different data types and sources and their different data manipulation requirements, specialized functions or methods are usually needed to process the streaming data. While a CEP system provides support for some native data types and/or methods/functions for the native data types, these native data types or functions are many times not sufficient to cover the diverse types of processing needed by applications that use a CEP system. This in turn reduces the usefulness of the CEP system.
As a result, processing platforms, such as CEP systems, constantly have to be extended by application developers and service providers to support heterogeneous data formats and their data manipulation mechanisms in order to interact/interoperate with diverse sources of events and data. For example, consider a CEP system that processes localization events emitted by GPS devices. Such a CEP system would need to understand spatial data formats and functions related to the spatial data format.
In the past, the capabilities of a CEP system were extended exclusively through user defined functions (UDFs) or special code (e.g., customized Java beans). To achieve extensibility, an application developer for a specific application had to define customized user defined functions (UDFs) to interact with the specialized application. The application developer had to design one function at a time and define the function's interface based upon predefined data types provided by the CEP system. This process however has several drawbacks and inefficiencies. The UDFs that are designed are very application-scoped and are thus hard to re-use amongst other applications of the CEP system. The UDFs cannot be reused since they are closely coupled or tied to the application defining the UDF. For example, a UDF defined for a video-processing application cannot be used in another application. Further, the UDFs are individually defined and cannot be grouped into domains (e.g., spatial), therefore making their management difficult. Additionally, UDFs provide a poor programming experience, as the usage of the extension in the form of a UDF is not transparent to the user.