We live in a connected world where seemingly billions of devices are deployed and connected or interconnected for myriad different purposes. Such purposes oftentimes include a wide range of uses in industrial, residential, and/or consumer contexts. For example, on the industrial side, sensors may be deployed to wind power generators and may report on turbine operational data, e.g., with data used to perform predictive analytics relevant to maintenance-related issues. Another example that involves industrial, residential, and consumer contexts involves utility companies collecting power usage data at customer households (e.g., as is done using SmartMeters provided by PG&E), and suggesting to customers ways to save on electricity bills.
The amounts of data that may be involved could be quite large. The two examples provided above, for instance, could be said to involve “Big Data” operations, based on the tremendous amounts of data involved in each use case. In the Big Data world, data may be ingested into a data center where the analytics are performed. It will be appreciated that the volume of data could be overwhelming for one site to process, especially when some of the data need to be processed in real-time. In other words, although some data is time-sensitive and would benefit from real-time processing, other data is not so important and in essence might be allowed to “sit around” until such time as it could be handled by idle resources.
Unfortunately, many current messaging technologies address only a subset of the requirements associated with Big Data ingestion. For example, many current messaging technologies typically focus on distributing the data to some end-point(s) where the analytics power resides and thus underutilize the resources along the way. The ability to provide real-time ingestion and processing using current models can be diminished, at least compared to a situation where such processing power is not wasted. In other words, some current messaging technologies take a post-collection analytics approach. Some filtering capabilities may be provided via SQL select queries, some aggregation capabilities may be provided by combining multiple streams, and some routing capabilities may be provided according to a predefined rigid topology. Yet because the majority of the analytics are to be performed only after all of the data is collected, the post-collection analytics work can impose a large processing burden at the end of the process and can slow down the whole analytics effort. In a somewhat related vein, some current technologies have uniformly (in)capable nodes that offer the same messaging functionalities across the network and do not take into account the computing power available in a certain device or data center.
Some current messaging technologies also provide minimal filtering, aggregation, and routing capabilities, and are not flexible and dynamic. Indeed, there are some current technologies that are limited to ingesting data in a proprietary format. Rigid and proprietary network composition may limit implementations such that users are forced to use proprietary protocols and transports, resulting in rigid and inflexible use cases. For example, Twitter Storm is limited to Tweets collection or data ingestion into Hadoop and Apache Flume is provided only for data ingestion into Hadoop.
It would be desirable to handle large amounts of data in an efficient manner. In this regard, the inventors have recognized that it would be desirable to provide an intelligent message grid, overlaid on geographically-distributed sites, that assists in the efficient utilization of processing resources (e.g., network processing resources such as, for example, bandwidth, computing resources, etc.) in the grid as a whole. The inventors have further recognized that it would be desirable to distribute data to the right resources at the right time with an intelligent and flexible messaging layer. Some data may need to be ingested and processed very close to the origin, some may need ultra-fast processing while other data may be needed for offline historical analysis, etc. It would, for example, be advantageous to configure and use the messaging grid based on the needs of the analytics, e.g., so that data can be classified and routed accordingly. Furthermore, with a tightly integrated stream processing layer (e.g., a complex event processing (CEP) layer), such a messaging grid would be able to provide suitable analytics along the way with automated and smart switches and filtering, instead of having to depend on the analytics power at the end-point-sites.
Certain example embodiments provide for such features. For instance, certain example embodiments provide a messaging grid that:                Supports data classification and routing according to different data characteristics;        May be integrated with analytics engines (e.g., CEP engines) to provide analytics capabilities on each and every node/site;        May be flexible and dynamic to adapt to changing analytics needs;        Has lightweight nodes that can perform very efficient and fast data routing (e.g., at sub-millisecond speeds), filtering, and aggregation, as well as sophisticated nodes that can perform sophisticated data routing, distribution, filtering, aggregation, and analytics;        Supports real-time data ingestion (e.g., with throughput at a millisecond or less speed);        Supports multiple channels so that multiple data flows can carry on simultaneously;        Supports geographically distributed node layouts;        May be data format agnostic so that structured or unstructured data can be ingested;        May be transport agnostic so that data can flow through sockets (e.g., TCP/IP), shared memory (SHM), remote direct memory access (RDMA) requests, etc;        May be protocol agnostic so that data can be packaged under HTTP/HTTPS, SSL, Google Protobuf, etc;        May be language agnostic so that data can be ingested by clients and/or peers that are written in different programming languages such as, for example, C/C++, .NET platform languages, Java, Python, JavaScript, etc; and/or        May be messaging paradigm agnostic so that data can be sent via a distribution policy (e.g., unicast, multicast, round-robin, etc.), group policy, etc.        
Stream processing typically follows the pattern of continuous queries, which may be thought of in some instances as being queries that execute for a potentially indefinite amount of time on data that is generated or changes very rapidly. Such data are called streams, and streams oftentimes comprise events. Such streams often exist in real-world scenarios, e.g., as temperature readings from sensors placed in warehouses or on trucks, weather data, entrance control systems (where events are generated whenever a person enters or leaves, for instance), etc. Events may include attributes (also sometimes referred to as a payload) such as, for example, the value of temperature readings and metadata (sometimes referred to as a header or header data) such as, for example, creation date, validity period, and quality of the event. Possible events occurring in an environment typically are schematically described by so-called event types, which in some respects are somewhat comparable to table definitions in relational databases. Streams may in certain scenarios be organized in channels that in turn are implemented by an event bus. Channels and event types in this sense may be considered orthogonal concepts, e.g., in the sense that channels may comprise events of several event types, and events of the same event type might be communicated via different channels. In a CEP system, events may be evaluated and aggregated to form derived (or complex) events (e.g., by an engine or so-called event processing agents). Event processing agents can be cascaded such that, for example, the output of one event processing agent can be the input of another event processing agent. Thus, CEP may be thought of as a processing paradigm that describes the incremental, on-the-fly processing of event streams, typically in connection with continuous queries that are continuously evaluated over event streams. Moreover, CEP analysis techniques may include, for example, the ability to perform continuous queries, identify time-based relations between events by applying windowing (e.g., through XQuery), etc., with the aid of processing resources such as at least one processor and a memory. See, for example, U.S. Pat. Nos. 8,640,089 and 8,266,351, as well as U.S. Publication Nos. 2014/0078163, 2014/0025700, and 2013/0046725, the entire contents of each of which are hereby incorporated herein by reference. As indicated above, certain example embodiments make use of CEP engines and/or the like.
One aspect of certain example embodiments relates to a grid of acting components, where multiple CEP engines are provided for handling different classes of data, and where multiple messaging systems are provided for communicating among and/or between the nodes depending on the particular data involved.
Another aspect of certain example embodiments relates to the definition of a switch to route complex events to the appropriate CEP engine in a messaging system that includes multiple different CEP engines, e.g., within the message itself. The use of such a switch may be advantageous as compared to implementing a switch in a messaging realm server or broker, as the latter would be problematic for brokerless connections. Instead, in certain example embodiments, the filtering/routing may be performed inside of the CEP engine, and possible regardless of the CEP engine type.
Another aspect of certain example embodiments relates to a dynamic data classification and routing capability. For instance, in certain example embodiments, routing can occur depending on specific content of the complex event message, or it can be derived from other indirect means (e.g., address of sender/receiver, frequency of events, combination of multiple field values, etc.).
Another aspect of certain example embodiments relates to the ability to connect to and interface with one or more back-end containers (e.g., a container stored in the Hadoop Distributed File System) for persisting historical data that can be processed at a later time (such as, for example, when just-in-time processing is not required).
In certain example embodiments, a computer system comprising a plurality of computing nodes connected in a network is provided. Each said node includes processing resources including at least one processor and an interface to the network. Each said node is dynamically configurable to send and/or receive messages over the network via its respective interface using one of brokered and brokerless communication models, with at least one said node being configured to send and/or receive messages using the brokered communication model and with at least one other said node being configured to send and/or receive messages using the brokerless communication model. At least a subset of the nodes have a complex event processing (CEP) engine deployed thereto, with the CEP engines being configured to cooperate with the processing resources of the respective nodes to which they are deployed in order to operate on messages received by the respective nodes. The CEP engines are classified as one of at least two different types of CEP engines, with at least one said node having a first type of CEP engine deployed thereto and with at least one other node having a second type of CEP engine deployed thereto. For each message received by a given node that is to be forwarded to a further node along one of plural possible paths, the given node is configured to use its processing resources and interface to the network to route the message to be forwarded to an intermediate node in one of the possible paths. The intermediate node is selected by the CEP engine of the given node based on metadata associated with the message to be forwarded.
In certain example embodiments, a computing node in a network comprising a plurality of different computing nodes is provided. The computing node comprises at least one processor; an interface to the network; and a complex event processing (CEP) engine that, with the aid of the at least one processor, is configured to operate on received messages. The CEP engine is classified as one of at least two different types of CEP engines, with a first type of CEP engine having processing capabilities greater than those of a second type of CEP engine. The computing node is dynamically configurable to send and/or receive messages over the network via the interface using one of brokered and brokerless communication models. For each message received by the computing node that is to be forwarded to a further node along one of plural possible paths through the network, the computing node is configured to use its processing resources and interface to the network to route the message to be forwarded to an intermediate node in one of the possible paths. The intermediate node is selected by the CEP engine of the computing node based on metadata associated with the message to be forwarded. Routing selections made by the computing node are dynamically changeable in response to changing metadata, and routing selections are transparent to message generators on the different computing nodes in the network.
In certain example embodiments, there is provided a method of routing messages in a computer system comprising a plurality of computing nodes connected in a network. Each said node includes processing resources including at least one processor and an interface to the network. Each said node is dynamically configurable to send and/or receive messages over the network via its respective interface using one of brokered and brokerless communication models, with at least one said node being configured to send and/or receive messages using the brokered communication model and with at least one other said node being configured to send and/or receive messages using the brokerless communication model. At least a subset of the nodes have a complex event processing (CEP) engine deployed thereto, with the CEP engines being configured to cooperate with the processing resources of the respective nodes to which they are deployed in order to operate on messages received by the respective nodes. The CEP engines are classified as one of at least two different types of CEP engines, at least one said node having a first type of CEP engine deployed thereto and at least one other node having a second type of CEP engine deployed thereto. The method comprises, for each message received by a given node that is to be forwarded to a further node along one of plural possible paths, using the processing resources and the interface to the network of the given node to route the message to be forwarded to an intermediate node in one of the possible paths, with the intermediate node being selected by the CEP engine of the given node based on metadata associated with the message to be forwarded. At least some of the nodes are geographically dispersed from one another.
In certain example embodiments, there is provided a method of configuring a computer system that routes messages. The computer system comprises a plurality of computing nodes connected in a network, wherein each said node includes processing resources including at least one processor and an interface to the network. The method comprises: dynamically configuring the nodes to send and/or receive messages over the network via their respective interfaces using one of brokered and brokerless communication models, at least one said node being configured to send and/or receive messages using the brokered communication model and at least one other said node being configured to send and/or receive messages using the brokerless communication model; and deploying to at least a subset of the nodes a complex event processing (CEP) engine, the CEP engines being configured to cooperate with the processing resources of the respective nodes to which they are deployed in order to operate on messages received by the respective nodes, the CEP engines being classified as one of at least two different types of CEP engines, at least one said node having a first type of CEP engine deployed thereto and at least one other node having a second type of CEP engine deployed thereto. For each message received by a given node that is to be forwarded to a further node along one of plural possible paths, the given node is configured to use its processing resources and interface to the network to route the message to be forwarded to an intermediate node in one of the possible paths, the intermediate node being selected by the CEP engine of the given node based on metadata associated with the message to be forwarded. At least some of the nodes are geographically dispersed from one another.
Non-transitory computer readable storage mediums tangibly storing instructions for performing the above-summarized and/or other approaches also are provided by certain example embodiments, as well as corresponding computer programs.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.