Today, companies have to deal with an ever-increasing flood of business-relevant data. Indeed, because of technological advances more and more data is being produced on a daily basis. Computer applications based on those streams of data often have a time-sensitive or time-critical nature. It oftentimes is the case that the data needs to be processed and analyzed as fast as possible in order to obtain a competitive edge. Applications that are able to work in this manner are of potential interest in a variety of different industries such as, for example, for algorithmic trading in the finance sector, network monitoring in Information Technology (IT) departments, delivery tracking for logistics purposes, monitoring of business processes, etc.
Stream processing typically follows the pattern of continuous queries, which may be thought of in some instances as being queries that execute for a potentially indefinite amount of time on data that is generated or changes very rapidly. Such data are called streams, and streams oftentimes comprise events. Such streams often exist in real-world scenarios, e.g., as temperature readings from sensors placed in warehouses or on trucks for logistics purposes, weather data, entrance control systems (where events are generated whenever a person enters or leaves, for instance), etc. Events may include attributes (also sometimes referred to as a payload) such as, for example, the value of temperature readings and metadata (sometimes referred to as a header or header data) such as, for example, creation date, validity period, and quality of the event. Possible events occurring in an environment typically are schematically described by so-called event types, which in some respects are somewhat comparable to table definitions in relational databases.
Streams may in certain scenarios be organized in channels that in turn are implemented by an event bus. Channels and event types in this sense may be considered orthogonal concepts, e.g., in the sense that events of the same event type might be communicated via different channels.
Complex Event Processing (CEP) is an approach to handling the challenges associated with processing and analyzing huge amounts of data arriving with high frequencies. As will be appreciated from the above, in this context, the arriving data is classified as an event stream. CEP systems are designed to receive multiple streams of events and analyze them in an incremental manner with very low (e.g., near-zero) latency. Events may be evaluated and aggregated to form derived (or complex) events (e.g., by an engine or so-called event processing agents). Event processing agents can be cascaded such that, for example, the output of one event processing agent can be the input of another event processing agent. In other words, while the data is streaming in, it may be analyzed on-the-fly, and corresponding analytical results may be forwarded to subsequent consumers. Therefore, a CEP system need not necessarily persist the data it is processing.
Thus, CEP in general may be thought of as a processing paradigm that describes the incremental, on-the-fly processing of event streams, typically in connection with continuous queries that are continuously evaluated over event streams. Moreover, CEP analysis techniques may include, for example, the ability to perform continuous queries, identify time-based relations between events by applying windowing (e.g., through XQuery), etc., with the aid of processing resources such as at least one processor and a memory. See, for example, U.S. Pat. Nos. 8,640,089 and 8,266,351, as well as U.S. Publication Nos. 2014/0078163, 2014/0025700, and 2013/0046725, the entire contents of each of which are hereby incorporated herein by reference.
The development of a CEP application typically comprises several main steps. First, the user connects to a set of streams that continuously deliver events. Second, the user defines the business logic for analyzing the event streams. Third, the user defines how to deal with the results.
Unfortunately, it oftentimes is challenging to define the business logic. For instance, the analysis of data sources typically is not a straightforward process where the targets are already predefined. Instead, it oftentimes is more an iterative process, with the analysis steps being aligned to the characteristics of the data source. A common first step is to derive some basic characteristics of the data source before successively zooming into the data and gaining deeper knowledge. In the CEP context, the analysis of event streams can be even more challenging. For example, the user oftentimes cannot easily examine an arbitrary history of the stream to obtain some starting points for further analysis or follow-up, e.g., because such data is not persisted or readily re-creatable. Nor is it straightforward for a user to traverse the data multiple times, e.g., for similar reasons. Although the user can connect to the stream and from that point on obtain the events, ad hoc analysis can be difficult (e.g., for a programmer who might not have a detailed sense of the business needs, requirements, potential tuning points, etc.), and post hoc analysis may not be possible because a connection to a stream generally will not provide access to its previous segments because doing so would be tantamount to providing events that occurred in the past.
Thus, it will be appreciated that it would be desirable to overcome these and/or other problems. For instance, it will be appreciated that it would be desirable to address issues associated with CEP developers facing event streams with unknown characteristics, e.g., by providing tools that assist in the definition of business logic, stream analysis, and generation of output.
Certain example embodiments help address these and/or other needs. For instance, certain example embodiments assist a CEP developer by providing event stream profiles. And by providing the developer with a set of profiles of the available event streams, the operational CEP queries can be defined with potentially more reliable and deeper knowledge about a stream's behavior.
One aspect of certain example embodiments relates to enabling stream profiling based on streams and queries in CEP systems.
Another aspect of certain example embodiments relates to emitting query registration events from the CEP engine to the event bus so that a stream profiler component can assess and analyze which streams are involved in which type of queries which, in turn, allows for an assessment and analysis of the stream's relevance.
Another aspect of certain example embodiments relates to stream-based and/or query-based profiling approaches, that potentially provide(s) the developer with a better understanding of the available event streams. Such profiles optionally may be visualized using the CEP engine's integrated development environment (IDE) or other software application developers can use to develop CEP software.
In certain example embodiments, a method of profiling event streams received from an event bus is provided. Input events from one or more input event streams emitted to the event bus are received. Query registration-related events from a registration event stream emitted to the event bus are received, with the query registration-related events being associated with actions taken with respect to queries performed on the one or more input event streams. Event-based profiles are developed by subjecting the received input events to a profiling CEP engine, with the profiling CEP engine operating in connection with processing resources including at least one processor, and with the event-based profiles including data mining related characteristics and/or statistical characteristics for each said input event stream. Query-based profiles are developed by subjecting the received query registration-related events to the CEP engine, with the query-based profiles including data indicative of how relevant the queries performed on the one or more input event streams are and/or how those queries are relevant to the one or more input event streams on which they are performed. The event-based profiles and the query-based profiles are stored to a non-transitory computer readable storage medium. Query registration-related events are generated at least each time a query on the one or more input event streams is registered or deregistered.
In certain example embodiments, there is provided a stream profiler computer system comprising processing resources including at least one processor and an interface to an event bus over which events are receivable. The system further comprises a profiling CEP engine that, in cooperation with the processing resources, is configured to at least: receive input events from one or more input event streams emitted to the event bus; receive query registration-related events from a registration event stream emitted to the event bus, the query registration-related events being associated with actions taken with respect to queries performed on the one or more input event streams; develop event-based profiles from the received input events, the event-based profiles including data mining related characteristics and/or statistical characteristics for each said input event stream; develop query-based profiles from the received query registration-related events, the query-based profiles including data indicative of how relevant the queries performed on the one or more input event streams are and/or how those queries are relevant to the one or more input event streams on which they are performed; and store the event-based profiles and the query-based profiles to a non-transitory computer readable storage medium of the stream profiler computer system. Query registration-related events are generated at least each time a query on the one or more input event streams is registered or deregistered.
According to certain example embodiments, a CEP system, comprising an event bus, a production CEP engine, a development environment, and the stream profiler computer system described herein may be provided.
Similarly, non-transitory computer readable storage mediums tangibly storing instructions for performing the above-summarized and/or other approaches also are provided by certain example embodiments, as well as corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.