Field
Embodiments of the present invention generally relate to data networks and, in particular, to apparatuses and methods for processing out-of-order events, i.e. subsequent events that are received out of their original temporal order.
Description of Related Art
Sensor networks, such as, for example, wireless sensor networks, have a wide range of applications. For example, wireless sensor networks of various technologies may be used for locating purposes, such as locating humans and/or other objects. Here, “locating” means the detection or determination of a geographical location or position. Some specialized locating or position tracking systems may be used for locating players and other objects (e.g. a ball) in sport events, such as, for example, soccer, American football, rugby, tennis, etc.
With using gathered geographic location or positioning data of players and/or a ball it is possible to derive statistical information related to the whole sports event, for example a soccer match, or related to individual teams or players. Such derived statistical information may be interesting for various reasons. On the one hand, there are various commercial interests as certain statistics and their analysis may be of particular relevance for spectators in a stadium and/or in front of a television set at home. Hence, providing certain statistics may raise more interest in sport events. On the other hand, statistical data derived from the raw positioning data may as well be used for training purposes. Here, an opponent and/or the behavior of the own team may be analyzed as well as the performance and/or health condition of individual players.
The aforementioned locating or position tracking systems may be based on various technologies. For example, location information may be determined based on the evaluation of wireless radio signals and/or magnetic fields. For this purpose transmitters and/or receivers, generally also denoted as sensors, may be placed at the individual objects (e.g. players, ball, etc.) to be located by the system. Corresponding reception and/or transmission devices may also be mounted to predetermined locations around a geographical area of interest, as e.g. a soccer field. An evaluation of signal strengths, signal propagation times, and/or signal phases, just to name a few possible technical alternatives, may then lead to sensor data streams indicative of the geographic position of individual players or objects at different time instants. Typically, a geographic location data sample is associated with a timestamp indicating at which time an object was located at which geographic position. With this combined information kinematic data, like velocity (speed), acceleration, etc. may as well be provided in addition to the location data comprising, for example, x-, y-, and z-coordinates. In the sequel of this specification the location and kinematic data delivered by the localization sensor system will also be referred to as (raw) sensor data.
In a particular example of a wireless tracking system people or objects may be equipped with tiny transmitters, which may be embedded in footwear, uniforms and balls and whose signals are picked up by a number of antennas placed around the area under observation. Receiver units process the collected signals and determine their Time of Arrival (ToA) values. Based on a calculation of the differences in propagation delay, each transmitter's position is then continuously determined. In addition, a computer network integrated with the wireless tracking system may analyze the position or sensor data so as to detect specific events. Operating in the 2.4 or 5 GHz band, the tracking system is globally license-free.
Based on the raw sensor data streams outputted from the locating or position tracking system so-called “events” may be detected. Thereby an event or event type may be defined to be an instantaneous occurrence of interest at a point of time and may be defined by a unique event ID. In general, an event is associated with a change in the distribution of a related quantity that can be sensed. An event instance is an instantaneous occurrence of an event type at a distinct point in time. An event may be a primitive event, which is directly based on sensor data (kinematic data) of the tracking system, or a composite event, which is based on previously detected other events instead. That is to say, a composite event is not directly depending on raw sensor data but on other events. In ball game applications, an event may, for example, be “player X hits ball” or “player X is in possession of ball”. More complicated events may, for example, be “offside” or “foul”. Each event instance may have three timestamps: an occurrence, a detection, and an arrival timestamp. All timestamps are in the same discrete time domain. The occurrence timestamp ts is the time when the event has actually happened, the detection timestamp dts is the time when the event has been detected by an event detector, and the arrival timestamp ats is the time when the event was received by a particular Event Processing System (EPS) node. The occurrence and the detection timestamp are fixed for an event instance at any receiving node whereas the arrival timestamp may vary at different nodes in the network.
The detection of events (Complex Event Processing, CEP) based on underlying sensor data streams has raised increased interest in the database and distributed systems communities in the past few years. A wide range and ever growing numbers of applications nowadays, including applications as network monitoring, e-business, health-care, financial analysis, and security or the aforementioned sport-event supervision, rely on the ability to process queries over data streams that ideally take the form of time ordered series of events. Event detection denotes the fully automated processing of raw sensor data and/or events without the need of human intervention, as in many applications the vast quantity of supplied sensor data and/or events cannot be captured or processed by a human person anymore. For example, if high speed variations of players or a sports object, e.g. a ball, are to be expected, the raw sensor (locating or position tracking) data has to be determined at a sufficiently high data rate by the underlying (wireless) sensor network. Additionally, if there is a high number of players and/or objects (e.g. in soccer there are 22 players and a ball) to be tracked the amount of overall geographic location and kinematic data samples per second can become prohibitively high, in particular with respect to real-time event processing requirements.
Hence, even if raw sensor and/or event data streams are analyzed and signaled fully automated, there may still be by far too many information, which is possibly not even of any interest in its entirety. In the future this problem will even get worse as more and more devices will be equipped with sensors and the possibility to provide their determined sensor data to public networks such as the Internet for (e.g., weather or temperature data determined by wireless devices like smart phones). For this reason the amount of sensor data to be processed further into certain events of interest will rapidly grow. Automated event detection may provide remedy for this by trying to aggregate the raw sensor data piece by piece and to determine more abstract and inter-dependent events, which may transfer by far more information than the raw sensor data itself. For example, beside the aforementioned soccer-related examples, such determined events could include “car X is located at crossing Y” or “traffic jam on route X”.
The problem that arises in automated event detection is the required computing power for performing event detection on possibly massively parallel sensor and/or event data streams—and all this under at least near real-time processing requirements. This problem may be solved by parallelization of event detectors, which may, for example, run on different (i.e. distributed) network nodes of a computer network, which may, for example, communicate via Ethernet. Thereby an event detector automatically extracts a certain event of interest from an event or sensor data stream according to a user's event specifications. Individual event detectors may be distributed over different network nodes of a data network, wherein the different event detectors communicate using events and/or sensor data travelling through the network using different network routes and branches. Thereby, raw sensor data and/or event may be transported in data packets according to some transport protocol, like, e.g., UDP (User Datagram Protocol), TCP (Transmission Control Protocol)/IP (Internet Protocol), etc. This concept, however, causes new problems with respect to possibly unbalanced computational load among different network nodes and with respect to the synchronization of event data streams within the network. Without suitable countermeasures the computational loads among different network nodes are unbalanced and individual sensor and/or event data streams in the network are not time-synchronized to each other, which means that individual events may reach an event detector out of their original temporal order and thereby lead to false detected results.
Let us look at an exemplary soccer-scenario, wherein a plurality of parallel automatically operating event detectors is supposed to detect a pass from player A to player B. In order to detect the “pass”-event, the following preceding event sequence is required:                1. “player A is in possession of ball”,        2. “player A kicks ball”,        3. “ball leaves player A”,        4. “ball comes near player B”,        5. “player B hits ball”        
The event detection for event “player X kicks ball” may be based on the event sequence “player X near ball” and a detected acceleration peak of the ball. There are the following alternatives for setting up an automated event detector for said event “player X kicks ball”:
We may wait for individual required events—one after the other. If we have seen all the required events in the correct (temporal) order (here, any abortion criterions are disregarded for the sake of simplicity) we can say that we have seen or experienced a pass. However, for complex applications the detection of all the required events does not necessarily take place on a single network node or a CPU (Central Processing Unit) due to the parallelization of event detectors. For this reason it is not necessarily guaranteed that individual required events reach the event detector in the correct required order. This may, for example, be due to network jitter, varying and/or unbalanced CPU-load or increased network load. For example, consider an event stream comprising event instances e1, e2, . . . , en, with ek·ats<ek+1·ats, (1≦k<n), i.e., the events in the event stream are sorted by their arrival time in ascending order. If any event ei and ej with 1≦i<j≦n exists, such that ei·ts>ej·ts, then event ej is denoted as an out-of-order event.
Hence, we could try to buffer events and then search the buffer for the correct event pattern. But which buffer size should be used? If we say a pass has to happen within maximum 5 time units (e.g. seconds) we would have to consider events within a time period of maximum 5 time units after the first relevant event until we have either detected the pass or until we abort. However, it is also possible that the last relevant event is computationally quite complex, what requires a small additional buffer. But what is the size of this additional buffer? And what is the buffer-size related to composite event detectors that require the “pass”-event as an input event?
The K-slack algorithm of S. Babu, U. Srivastava, and J. Widom, “Exploiting k-constraints to reduce memory overhead in continuous queries over data streams,” ACM Trans. Database Systems, vol. 29, pp. 545-580, 2004, is a well-known solution to deal with out-of-order events in event detection. K-slack uses a buffer of length K to make sure that an event ei, can be delayed for at most K time units (K has to be known a-priori). However, in a distributed system the event signaling delays are dependent on an entire system/network configuration, i.e., the distribution of the event detectors, as well as the network- and CPU-load. Neither the final system configuration nor the load scenario may be foreseen at the time of compilation.
An approach by M. Li, M. Liu, L. Ding, E. A. Rundensteiner, and M. Mani, “Event stream processing with out-of-order data arrival,” in Proc. 27th Intl. Conf. Distributed Computing Systems Workshops, (Washington, D.C.), pp. 67-74, 2007, buffers an event ei at least as long as ei·ts+K≦clk. As there is no global clock in a distributed system, each node synchronizes its local clock by setting it to the largest occurrence timestamp seen so far.
An ordering unit that implements the K-slack approach applies a sliding window with a given K to the input stream, delays the events according to their timestamps, and produces an ordered output stream of events. However, a single fixed a-priori K does not work for distributed, hierarchical event detectors. As K-slack takes K time units to generate a composite event, an event detector on a higher layer that also buffers for K units and waits for the composite event, misses said event. Waiting times add up along the event detector hierarchy.
M. Liu, M. Li, D. Golovnya, E. Rundensteiner, and K. Claypool, “Sequence pattern query processing over out-of-order event streams,” in Proc. 25th Intl. Conf. Data Engineering, (Shanghai, China), pp. 784-795, 2009, avoid such problems by specifying an individual K for each event detector. Each Kn (n denoting the hierarchy level) must be set to a value larger than max(Kn−1), i.e., larger than the maximum delay of all subscribed events. Thereby a subscribed event is an event of interest for the respective event detector. The event detector of hierarchy level n subscribes to an event of a lower hierarchy level in order to use it as an input to detect a higher hierarchy event. Although this sounds good at first glance, choosing proper values for all Kj is difficult, application- and topologyspecific, and can only be done after careful measurements. Conservative and overly large Kj result in large buffers with high memory demands and in long delays for hierarchical CEP (as delays add up). Too large Kj must be avoided. In theory, for a general purpose system the smallest/best Kj can only be found by means of runtime measurements as the latencies depend on the distribution of event detectors and on the concrete underlying network topology. Moreover, best Kj-values change at runtime when detectors migrate.
As has been explained, Event-Based Systems (EBS) may be used as the method of choice for near-real-time, reactive analysis of data streams in many fields of application, such as surveillance, sports, stock trading, RFID-systems, and fraud detection in various areas. EBS may turn the high data load into events and filter, aggregate and transform them into higher level events until they reach a level of granularity that is appropriate for an end user application or to trigger some action. Often, the performance requirements are so high that event processing needs to be distributed over several computing nodes of a distributed computing system. Many applications also demand event detection with minimal event delays. For instance, a distributed EBS may detect events to steer an autonomous camera control system to points of interest. Such a system is schematically illustrated in FIG. 1.
FIG. 1 shows an EBS 100 which is coupled to a tracking system 110 (e.g. RTLS) comprising radio transmitters 112 which may be attached to one or more objects of interest. Radio signals emitted by the transmitters 112 and carrying raw sensor data may be received via antennas 114 and forwarded to a distributed computing network 120. Computing nodes of the computing network 120 may extract primitive events from the sensor data delivered by the tracking system 110. These primitive events may be processed by one or more event detectors 130 running on one or more computing nodes of the EBS 100. Thereby, the event detectors 130 may form an event detector hierarchy, wherein event detectors 130-1, 130-2 of the lowest hierarchy level may consume sensor data and/or primitive events derived therefrom and wherein event detectors 130-3, 130-4, 130-5 of higher hierarchy levels may consume composite events, which are based on previously detected lower level events. If a certain event of interest (for example, player hitting a ball) has been detected by the EBS 100, a camera 140 may be automatically steered to capture video footage of the detected event of interest. This obviously requires low detection latency.
To process high rate event streams, the EBS 100 may split the computation over several event detectors 130-1 to 130-5, e.g. linked by publish-subscribe to build an event detection hierarchy. These event detectors 130 may be distributed over the available machines comprised by the computing network 120. Ignoring a wrong temporal order caused by different event propagation delays at the event detectors 130 may cause misdetection. The event detectors 130 themselves cannot reorder the events with low latency because in general event delays are unknown before runtime. Moreover, as there may also be dynamically changing application-specific delay types (like for instance a detection delay), there is no a priori optimal assignment of event detectors to available computing nodes. Hence, in a distributed EBS, middleware may deal with out-of-order events, typically without any a priori knowledge on the event detectors, their distribution, and their subscribed events. Thereby middleware commonly denotes software layer that provides services to (distributed) software applications beyond those available from the operating system.
Buffering middleware approaches may withhold the events for some time, sort them and emit them to the detector in order. The main issue is the size of the ordering buffer. If it is too small, detection fails. If it is too large, it wastes time and causes high detection latency. Note that waiting times may add up along the detection hierarchy. The best buffer size is unknown and may depend on some dynamic, unpredictable behavior. In addition, there is no need to buffer events that cannot be out of order or that can be processed out of order without any problems. Buffering middleware may be the basis of reliable event detection but is too costly for many types of events and do not benefit from faster CPUs as they are bound by the waiting times.
Speculative middleware, another approach to cope with out-of-order event arrivals, speculatively work on the raw event stream. As there is no buffering, this is faster. Whenever an out-of-order event is received, falsely emitted events may be retracted and the event stream may be replayed. The effort for event retraction and stream replay grows with the number of out-of-order events and with the depth of the event detection hierarchy. This is a non-trivial challenge for memory management, may exhaust the CPU and may cause high detection latencies or even system failures. In contrast to the aforementioned buffer-based approaches, a stronger CPU may help, but the risk of high detection latencies still remains.
Badrish Chandramouli, Jonathan Goldstein, and David Maier, “High-Performance Dynamic Pattern Matching over Disordered Streams”, in Proceedings of the VLDB Endowment, volume 3, pages 220-231, Singapore, 2010, permit stream revisions by using punctuations. They give an insertion algorithm for out-of-order events that removes invalidated sequences. However, removing invalidated sequences is not possible for highly distributed systems. Events that need to be invalidated may already be consumed/processed on other nodes. Chandramouli et al. limit speculation either by sequence numbers or by cleanse. The receiver can use the former to deduce disorder information in the rare cases when particular events are generated at stable rates. The latter only works for a punctuation-based environment, which must incorporate the event definition to limit query windows by setting the punctuation to the latest event time stamps of the event detector. However, this information cannot be used as a generic buffering extension when the middleware technique cannot access said information.
Hence, it is desirable to provide an improved approach to cope with out-of-order event arrivals.