Inter-stream messaging systems, such as those implemented by a Disruptor message router, route messages from one or more input streams to one or more output streams. “Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads” by Thompson, et al., published May 2011 provides information about the Disruptor router, and is incorporated by reference as if fully set forth herein.
Many times, the messages being routed by an inter-stream messaging system represent events that have been produced by one or more producers. The event-representing messages are added to the input streams by the producers, and are to be processed by one or more consumers that process messages from the output streams. The inter-stream router of such a system routes any given message from a given input stream to one or more of the output streams based on routing criteria for the system. In the case of a Disruptor-implemented inter-stream messaging system, input streams are implemented as partitions of a ring buffer used by the Disruptor router to store messages, and consumers read the messages from the ring buffer partitions
FIG. 1 depicts an example inter-stream messaging system 100 in which a router application 140 routes messages from a set of input streams 110, 120, and 130 to a set of output streams 150, 160, and 170 based on known routing/mixing criteria. In this example system, each input stream is associated with a message producer, and stores event-representing messages from the associated producer. The input streams may be populated by their respective producers at different rates, depending on the functionality of the producers populating those streams.
According to the routing criteria for the example system, router 140 sorts incoming messages by type, and assigns all messages with a given type to an output stream that is associated with that type. In the depiction of FIG. 1, input streams 110-130 include a set of messages, and output streams 150-170 show the same set of messages having been routed to the various output streams by router 140. For example, input stream 110 includes two withdrawal-type messages 112 and 114, and two deposit-type messages 116 and 118, which were created in response to operations performed by a particular user of the system with which input stream 110 is associated. The messages in input stream 110 are ordered from least-recently generated (message 112) to most-recently generated (message 118).
Output stream 150 is associated with withdrawal-type messages and output stream 170 is associated with deposit-type messages. When routing the messages from input stream 110, router 140 sends messages 112 and 114, in that order, to output stream 150 for processing (see corresponding messages 152 and 156). Router 140 also sends messages 116 and 118, in that order, to output stream 170 for processing (see corresponding messages 174 and 178). In this way, FIG. 1 depicts the functionality of router 140 for a given set of messages.
Some systems require strict ordering of routed messages. To illustrate, it shall be assumed that system 100 is part of a financial system, in which the relative ordering of messages in the input streams must be preserved in the output streams. A financial system is merely one of a virtually unlimited number of types of systems in which the techniques described herein may be used, and the invention is not limited to any particular type of system.
Preserving input ordering of the messages in the output ordering ensures that messages are able to be processed from the output streams in order of message generation, such that the withdrawals, deposits, balance checks, etc., represented by the messages, are processed in order. Thus, the ordering of messages in the input streams of FIG. 1 is preserved in the output streams. For example, message 112 (“W1”) was input into stream 110 before message 114 (“W2”). Accordingly, in output stream 150, message 152 that corresponds to input message 112 is ordered before message 156 that corresponds to input message 114.
A first message is located “before” a second message in a given stream when the first message is closer to a head of the stream than the second message. Conversely, the second message is located “after” the first message in the stream because the second message is closer to a tail of the stream than the first message. The head of a message stream (or the head of a structure in which messages from the stream are stored) is the portion of the stream with the least-recently inserted messages, and the tail of the stream (or structure) is the portion of the stream with the most-recently inserted messages. To illustrate in the case of output stream 150 in FIG. 1, message 152 is at the head of the stream and message 158 is at the tail of the stream.
Many times, an inter-stream messaging system processes routed messages in batches. Processing messages from an output stream may comprise any kind of processing that logically removes the messages from the stream, such as publishing the messages to a consumer service that takes action based on the messages. When an inter-stream router processes messages in batches, there are times when one or more messages, having been routed to output streams, are not yet processed.
Recovering from router failure scenarios requires determining which messages had been routed but had not yet been processed prior to failure. Efficiently recovering can be difficult. The difficulty in recovering from such a router failure is compounded by the fact that any message can flow to any output stream from any input stream in an inter-stream message system. Thus, there is a possibility of losing messages when the inter-stream router goes down.
One way to facilitate failure recovery of a router failure is to require the inter-stream messaging system to persist, to a persistent store, the entire state of the system including records of messages that have been routed to output streams. However, this solution is inefficient because it requires a significant amount of storage capacity to store the state information, and it takes time and resources to keep a persisted system state up-to-date. Furthermore, the recovery procedure that is based on the persisted state requires expensive querying all of the stored message records, and then finding the messages with minimum/maximum timestamp.
Specifically, in order to perform recovery, generally all stored messages are read from the persisted system state, and the system identifies a minimum timestamp for a particular attribute (such as creation time) across all persisted messages. Messages at the identified minimum timestamp become the starting place for recovery of reading/publishing messages. As discussed above, this process is expensive both in the amount of storage needed and the amount of processing power needed to accomplish recovery. Thus, recovering from an inter-stream router failure based on a stored state of the system is both costly and inefficient.
As such, it would be beneficial to facilitate efficient recovery of an inter-stream messaging system suffering from a router failure, which recovers lost messages while requiring little storage space and low maintenance costs.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Further, it should not be assumed that any of the approaches described in this section are well-understood, routine, or conventional merely by virtue of their inclusion in this section.