The present application relates to the technical field of monitoring systems and in particular to tracking messages from multiple message sources.
One example type of monitoring system includes market surveillance systems. Market surveillance systems typically obtain data from a number of disparate data sources. Typically, incoming data message feeds from those sources are not synchronized relative to each other. In order to build an accurate picture of the sequence of actions leading up to an event of interest, messages on each of the incoming data feeds must be ordered in the correct sequence to produce a consolidated data feed.
Messages can be readily re-sequenced correctly at the end of the day when there is no additional data expected and all the messages for the day have been completely received. But it is more complex to re-sequence the messages for close to real-time processing because it is not certain that all the messages that carry a particular timestamp have been received since there are still incoming messages.
There are a number of issues commonly seen with incoming market data feeds that may have an impact on the complexity of data message sequencing including:                (i) Missing time stamps—some messages may not have time stamps.        (ii) Inconsistent time stamp granularity—some messages may have time stamps with different granularity as compared to others, e.g., order and trade time stamps have microsecond precision, while trade cancellations only have second precision.        (iii) Incorrect chronological order—some data feeds are chronologically sorted and messages can arrive in seemingly random time order. In some cases, some messages may be “late” by minutes.        (iv) Incorrect logical order—messages can arrive out of logical sequence, e.g., a trade message arrives prior to both the order messages that should precede the trade.        (v) Data feed latency—latencies in each data feed may be introduced from various sources such as:        (a) Transmission latencies (from trading engine to processing destination);        (b) Excessive system load, particularly during peak periods;        (c) Data pre-processing or conversion activities, particularly those that involve data re-sequencing.        (d) Interruptions to one or more data feeds that cause the feed(s) to be unavailable while other data feeds may remain available.        
Therefore, there is a trade-off between the correctness of the message sequence and message processing latency, i.e., how close to real-time the data message feed is processed.
Some market surveillance systems identify patterns of interest in the trading data by evaluating the transaction stream against business rules. Typically, the rules are event-driven—transaction events (e.g., order entry, order amendment, trade, etc.) are examined in the context of the pre-existing market picture to determine if they are of interest. The incoming transactions must be processed in correct chronological and logical order to ensure that the snapshot of the state of the market as of a particular event is accurate.
For instance, unusual price movements or trade volumes in a security that occur just before a price sensitive news announcement can be indicative of insider trading. The trigger event for an insider trading alert might be the appearance of the news announcement, at which point transactions in the security for a period leading up to the news announcement will be examined for unusual patterns. But if a transaction that occurred before the news announcement is incorrectly sequenced in the data feed such that it appears after the news announcement, it will not generate an alert. In other words, that transaction which should trigger an alert was not present when the alert rules were evaluated at the time the news announcement trigger event occurred.
Market surveillance systems do not “back track” and re-evaluate trigger conditions when out of sequence transactions are received. For example, it is difficult to identify which of the trigger conditions would be affected by the out of sequence transactions. Moreover, it is usually prohibitively computationally expensive to go back and re-evaluate rules every time an out of sequence transaction is received if the data is generally incorrectly chronologically sorted.
It would be advantageous if a market surveillance system could improve determinism and orderly processing of messages in markets that contain multiple data feeds from different data sources. One approach might be to assume that all constituent data feeds are already correctly chronologically sequenced within each data feed. If the upstream data feed is not in chronological order, then an intermediate process could be inserted to ensure that the data feed gets re-sequenced before the monitoring system reads the data feed and interleaves the messages from the different data feeds.
For example, a regularly-produced (e.g., periodic) timing signal (sometimes referred to as a “metronome feed” in non-limiting, example embodiments this application) could be used to introduce a predetermined amount of latency into the processing of messages from the data feeds in order to produce more accurate data message monitoring or tracking. The regularly-produced timing interval period or frequency determines the interval between timing messages (sometimes referred to as “heart beat” messages in example embodiments in the application) in the regularly-produced timing feed. The regularly-produced timing feed might include a feed of incrementing timestamps that lag behind real time by a configurable time period. Message tracking only processes messages up to the metronome time, which delays the feed.
This is illustrated in the example shown in FIG. 1 where messages are read in sequence with a regularly-produced timing feed. The regularly-produced timing lag or delay time period may be configured based on the expected lags in the data feeds used in the market. The tracking accuracy increases with larger lag periods. The longer the upstream processes have to transmit the required messages, the higher the probability that the messages will not end up being read out of sequence.
FIG. 2 shows an example where messages are read out of sequence under lower latency configurations. With a shorter regularly-produced timing lag period, there is an increased risk that a delay on one feed, for whatever reason, will cause messages to be processed out of sequence. Thus, it is not usually advisable to make the configured timing feed lag arbitrary low because an arbitrary low lag period may prevent the surveillance system from detecting patterns related to events and thus to alert surveillance analysts or other monitoring personnel. Therefore, a lag period may be set (configured) in accordance with what is considered to be a justifiable trade-off between correct sequencing (e.g., so as to detect patterns accurately) and maximum processing latency. But as mentioned above, configuring such a lag introduces a fixed processing latency for all data message feeds.
Processing in a market with a timing feed-based approach will be “jerky” because there will be unavoidable pauses and subsequent processing bursts as the monitoring apparatus waits for the next available message on the regularly-produced timing feed, processes all the messages that it is allowed to read, then waits for the next available message on the regularly-produced timing feed. The regularly-produced timing interval period determines the extent of the jerkiness of processing within the market. The regularly-produced timing interval period can also be used to determine the maximum granularity of time movement within the market. But short regularly-produced timing interval periods (e.g., lms) may result in an unnecessarily large load on the storage subsystem due to the number of input output operations consumed by memory disk write operations. In systems where the data to be monitored is streamed without being written to storage memory, a shorter regularly-produced timing interval period may cause problems with network overload.