Today, companies have to deal with an ever-increasing flood of business-relevant data. Indeed, because of technological advances and high degrees of connectivity, more and more data is being produced on a daily basis. This phenomenon is spread across all industries including, for example, in the financial sector (e.g., where stock tickers report trading activities); in logistics (e.g., where the transport status of goods is continuously reported), in health care systems (e.g., where a variety of sensors reports various measurements), in manufacturing (e.g., in connection with production lanes that are equipped with a multitude of status-tracking sensors), etc.
The newly-arising Internet of Things (IoT), with its millions of devices, will increase yet further the volumes of data being produced on a daily basis. The IoT refers generally to the interconnection of devices and services using the Internet. The number of connecting devices emitting information has increased rapidly and is expected to continue increasing significantly. The IoT thus involves the handling of huge, heterogeneous volumes of data.
The amount of data and the frequency with which it is produced is generally so high that it oftentimes is referred to as being a data stream and/or an event stream. It will be appreciated that companies that are able to process and analyze such streams in a timely manner may be able to leverage such intelligence into competitive advantages. For instance, a delayed arrival time of goods can be communicated early, a production error can be quickly detected, an attempt at credit card fraud can be blocked in a timely manner, etc.
Stream processing typically follows the pattern of continuous queries, which may be thought of in some instances as being queries that execute for a potentially indefinite amount of time on data that is generated or changes very rapidly. Such data are called streams, and streams oftentimes comprise events. Such streams often exist in real-world scenarios, e.g., as temperature readings from sensors placed in warehouses or on trucks for logistics purposes, weather data, entrance control systems (where events are generated whenever a person enters or leaves, for instance), etc. Events may include attributes (also sometimes referred to as a payload) such as, for example, the value of temperature readings and metadata (sometimes referred to as a header or header data) such as, for example, creation date, validity period, and quality of the event. Some events may have a data portion and temporal information (e.g., plane LH123 has landed at 4:34 PM). Possible events occurring in an environment typically are schematically described by so-called event types, which in some respects are somewhat comparable to table definitions in relational databases.
Streams may in certain scenarios be organized in channels that in turn are implemented by an event bus. Channels and event types in this sense may be considered orthogonal concepts, e.g., in the sense that events of the same event type might be communicated via different channels. In some implementations an event bus may be thought of as a central bus for all event streams within an Event-Driven Architecture (EDA). An EDA generally is an architecture that captures the production and consumption of event streams and the reactions to those events. Components within an EDA may be designed to process events in an event-driven manner, e.g., directly when the event arrives. In this regard, in some scenarios, publishers can connect their streams to the bus so that the events are published on it, and subscribers can subscribe to the producer streams made available on the bus.
One aspect associated with the successful handling of event streams relates to adequate information technology (IT) support. Traditional database and data warehouse technology is not always powerful enough and is not necessarily designed to deal with these amounts of data. Thus, it may be necessary or desirable to extend the processing capabilities of companies so that their applications are able to support the real-time processing of event streams.
Complex Event Processing (CEP) is an approach to handling the challenges associated with processing and analyzing huge amounts of data arriving with high frequencies. As will be appreciated from the above, in this context, the arriving data is classified as an event stream. By processing the incoming events in main memory using sophisticated online algorithms, CEP systems can cope with very high data volumes (e.g., in the range of hundreds of thousands events per second) being processed and analyzed appropriately. CEP systems are designed to receive multiple streams of events and analyze them in an incremental manner with very low (e.g., near-zero) latency. Events may be evaluated and aggregated to form derived (or complex) events (e.g., by an engine or so-called event processing agents). Event processing agents can be cascaded such that, for example, the output of one event processing agent can be the input of another event processing agent. In other words, while the data is streaming in, it may be analyzed on-the-fly, and corresponding analytical results may be forwarded to subsequent consumers. Therefore, a CEP system need not necessarily persist the data it is processing. This is advantageous, because an event stream oftentimes is characterized by high volumes and high rates and therefore cannot be persisted.
Thus, CEP in general may be thought of as a processing paradigm that describes the incremental, on-the-fly processing of event streams, typically in connection with continuous queries that are continuously evaluated over event streams. Moreover, CEP analysis techniques may include, for example, the ability to perform continuous queries, identify time-based relations between events by applying windowing (e.g., through XQuery or SQL), etc., with the aid of processing resources such as at least one processor and a memory. See, for example, U.S. Pat. Nos. 8,640,089 and 8,266,351, as well as U.S. Publication Nos. 2014/0078163, 2014/0025700, and 2013/0046725, the entire contents of each of which are hereby incorporated herein by reference.
With CEP technology, relevant data can be extracted in time so that business applications operating on top of that technology can present analysis results with minimum latency to the user. A CEP-supported application can be connected to several event sources that continuously produce events, and such events can be analyzed and condensed by CEP analysis logic. The analysis results can be rendered for the business user (i.e., a user from a business unit, as opposed to a user from the entity's IT department, who is able to leverage dedicated business user applications that present business-relevant metrics) in a report, graphical user interface, and/or other medium.
One issue that arises in CEP-based applications relates to erroneous events. An event source might produce an erroneous event for any number of reasons such as, for example, communication problems, defective sensors, invalid data ranges, etc. For example, a temperature sensor may be defective and, thus, one of its generated events may have a value of “N/A” for its temperature attribute (e.g., as opposed to an expected numeric value). Erroneous events such as these typically cannot be processed adequately. But such erroneous events still might comprise relevant information. For instance, even though the temperature attribute value is faulty, the humidity attribute of the event may be correct. Problems thus may occur on the source layer. However, it also will be appreciated that errors might be thrown during query processing. For instance, an error might be thrown during query processing in response to a number overflow, division by zero, etc.
A question that arises relates to dealing with those errors, as the user consuming the results of the stream analysis oftentimes is not aware of them. Because the user is not necessarily aware of the errors, the user may base decisions on an incomplete and/or inaccurate data base. A resulting error might involve a business process being stopped, even though it might not be necessary or desirable to do so. As a result, the question might be thought of as follows: Given a CEP application whose underlying event sources produce erroneous events, how can the application be adapted so that the errors are properly handled and communicated to the business user?
Another question that arises relates to how the number of error events being produced can be reduced, or even completely avoided. Similar to the above, a potential complication is that the CEP administrator in charge of the CEP application might not be directly aware of the error events. And even if the administrator is aware of them, it could be difficult to find the root cause of the errors.
Because of the demanding requirements of Complex Event Processing, the proper handling of erroneous events can be even more challenging. The amount of errors and the frequency with which they arrive can be very high. Given the oftentimes time-critical nature of CEP applications, it would be desirable (and possibly even necessary) to handle such errors in a timely manner, and traditional technologies for cleaning static data cannot always be applied directly.
The preprocessing of data is a well-established step in data analysis. It typically comprises steps like data cleaning, data integration, as well as data transformation. Many different techniques exist for improving the quality of data. For example, there are techniques for dealing with missing values, removing noise in the data, and normalizing data. These steps are used to preprocess the data before mining and knowledge discovery algorithms are applied. The data being analyzed is typically static and can be traversed multiple times. Unfortunately, however, data preprocessing is typically designed for static data sets, and not for high-volume event streams that are analyzed on-the-fly. Moreover, even if data is preprocessed, errors nonetheless may still appear.
A manual approach could be used for error handling in the CEP context. Indeed, a CEP engine typically logs erroneous events in a log file. The user can explore that log file for errors and manually try to derive the impact of those errors on the application and the decisions. The administrator additionally or alternatively can investigate the log file and try to derive the characteristics of the error events, e.g., to conduct a root cause analysis. Unfortunately, however, the manual approach is very time-consuming and error-prone. There is a high risk that the business user will not check the log file often enough and therefore might not be able to revise a decision that already has been made based on incomplete and/or inaccurate data. Similarly, an administrator trying to perform a root cause analysis of the error may have to be skilled in analytics and data mining in order to uncover the real issues. These activities might take too long and/or come too late.
Another possible solution relates to the data warehouse approach. When errors in Complex Event Processing applications occur, they can be captured and stored in a data warehouse. Data warehouses typically comprise standard cleaning algorithms. This functionality can be used the clean the error events, which afterwards can be republished into the CEP application. Unfortunately, however, the data warehouse approach is not a suitable alternative because of common performance restrictions. For instance, data warehouses are not designed to deal with high data volumes and running analysis on-the-fly. CEP applications typically have a time-critical nature and, therefore, errors that occur also may need to be processed in a timely manner. Additionally, this approach does not include a proper handling of the error events so that the business user and the administrator are aware of the consequences.
The functionality for error handling of some commercially available CEP engines also does not fully address the issues identified herein. In general, these engines establish a kind of channel or listener to which errors are forwarded. It then is up to the user to define and implement corresponding follow-up logic. In essence, this is merely the starting point for an elaborated handling of error events.
In view of the foregoing, it will be appreciated that it would be desirable to overcome these and/or other problems. For instance, it will be appreciated that it would be desirable to address in an intelligent way issues associated with erroneous events that are produced in CEP applications.
Certain example embodiments help address these and/or other needs. For instance, certain example embodiments help address in an intelligent way issues associated with erroneous events that are produced in CEP applications, e.g., in connection with an error handler for event sources. The error handler of certain example embodiments captures error events, processes them, and analyzes their impact on follow-up applications. Additionally, it derives the characteristics underlying the error. Using the results of the error handler, the business user is automatically provided with a notification of relevant errors, along with suggestions regarding how to deal with them. The CEP administrator is provided with a model of the error characteristics so that root cause analysis can be performed.
One aspect of certain example embodiments relates to techniques in which error events are captured during runtime, the stream of errors are processed and analyzed on-the-fly, and the results are forwarded to the business user and to the administrator. With respect to the latter, the administrator can use the results for a root cause analysis, and the business user can assess the impact of an erroneous event and rerun analysis tasks for the corrected event. In other words, in certain example embodiments, erroneous events of arbitrary streams are detected and analyzed appropriately, e.g., so that business users can assess the errors' impacts and so that administrators can reveal the sources of the occurring errors. Statistical methods and mining technologies may be used to derive a model of the error characteristics, which can be used for a sophisticated root cause analysis.
Another aspect of certain example embodiments relates to enabling impact analysis of erroneous events for business users. In this regard, the error handler of certain example embodiments continuously analyzes the error events with respect to their impacts on the information to which the business user has subscribed. If the business user is affected, the error is immediately reported to the user, along with details on the impact. The user can correct/adapt the event and rerun corresponding analysis logic. This allows the user to evaluate the impact of the error and run corrective actions based on the new insights. In order to let the user concentrate on the most important facts and limit the number of error corrections, the user can additionally define the priorities with which error events are presented.
Another aspect of certain example embodiments relates to generating models for root cause analysis of erroneous events for administrators. In this regard, the error handler of certain example embodiments automatically runs analysis tasks over the stream of error events. These analysis tasks are designed to detect the circumstances under which error events occur. The administrator can use this information to estimate future error events, as well as to examine the root cause of the errors. In order to enable the administrator to quickly resolve errors, the error handler of certain example embodiments continuously derives those error characteristics and reports them to the administrator. Again, the results can be prioritized so that the administrator can concentrate on the most important errors.
Another aspect of certain example embodiments relates to enabling efficient analysis of erroneous events. The error stream can produce high volumes of erroneous events in a continuous manner. Therefore, processing and analyzing them may become challenging. Additionally, the business user and the administrator are to be informed immediately in case of serious errors. For that reason, the error handler of certain example embodiments internally leverages a CEP engine for analyzing the stream of erroneous events and reporting the analysis results.
In certain example embodiments, there is provided a computing system comprising processing resources including at least one processor and a memory. An event bus is configured to receive events from a plurality of external input event sources. An application includes input, processing, and output layers. The application is configured to process events received from the event bus, and to provide to the event bus (a) results obtained from processing received events, and (b) error events corresponding to errors detected at the input layer and/or the processing layer. An error handler, under control of the processing resources, is configured to: receive, via the event bus, events from the plurality of external input event sources; receive, via the event bus, error events from the application; generate, for a given error, an error analysis event and an error impact event by executing a CEP query on at least a corresponding received error event; and provide to the event bus generated error analysis events and generated error impact events. Generated error analysis events describe for an administrator detailed information analyzing the corresponding errors, and/or generated error impact events describe for a non-technical user impacts the corresponding errors have for a user application used by the non-technical user. The administrator and the non-technical user are different parties, and generated error analysis events and generated error impact events differ from one another in both structure and content.
In certain example embodiments, there is provided a method of handling errors in a computing system. The method comprises, at an error handler under control of processing resources including at least one processor and a memory: receiving, via an event bus, events from a plurality of input event sources external to the error handler; receiving, via the event bus, error events from an application that includes input, processing, and output layers, the application being configured to process events received from the event bus, and to provide to the event bus (a) results obtained from processing received events, and (b) error events corresponding to errors detected at the input layer and/or the processing layer; generating, for a given error, an error analysis event and an error impact event by executing a CEP query on at least a corresponding received error event; and providing to the event bus generated error analysis events and generated error impact events. Generated error analysis events describe for an administrator detailed information analyzing the corresponding errors, and/or generated error impact events describe for a non-technical user impacts the corresponding errors have for a user application used by the non-technical user. The administrator and the non-technical user are different parties, and generated error analysis events and generated error impact events differ from one another in both structure and content.
In certain example embodiments, an error handler is provided. It includes processing resources including at least one processor and a memory; and a CEP engine. The processing resources are configured to control the error handler to at least: receive, via an event bus, events from a plurality of input event sources external to the error handler; receive, via the event bus, error events from an application that includes input, processing, and output layers, the application being configured to process events received from the event bus, and to provide to the event bus (a) results obtained from processing received events, and (b) error events corresponding to errors detected at the input layer and/or the processing layer; generate, for a given error, using the CEP engine, an error analysis event and an error impact event by executing a CEP query on at least a corresponding received error event; and provide to the event bus generated error analysis events and generated error impact events. Generated error analysis events describe for an administrator detailed information analyzing the corresponding errors, and/or generated error impact events describe for a non-technical user impacts the corresponding errors have for a user application used by the non-technical user. The administrator and the non-technical user are different parties, and generated error analysis events and generated error impact events differ from one another in both structure and content.
Similarly, non-transitory computer readable storage mediums tangibly storing instructions for performing the above-summarized and/or other approaches also are provided by certain example embodiments, as are corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.