The present invention relates to distributed computing systems and, more particularly, to event routing in distributed computing systems with distributed data management.
A distributed computing system comprises a network of multiple, often hundreds or even thousands, computing nodes that communicate with each other. The computing nodes of a distributed computing system may comprise autonomous computers or one or more autonomous virtual computers or virtual machines (VMs) that operate on one or more computers. Distributed data management enables the system's data and the processes performed by the distributed computing system to be distributed throughout the many nodes of the system.
A data fabric, such as GemFire Enterprise® data fabric from Gemstone Systems Inc., provides a communication network interconnecting the computing nodes of a distributed computing system. The data fabric provides a data infrastructure that distributes and replicates data enabling data storage to be distributed throughout the nodes of the system and enabling the nodes to exchange and utilize data in the performance of the process(es) executing on the local node. The data fabric enables frequent updating of the data used by a plurality of processes executing on one or more of the distributed computing nodes and enables utilization of the data at high rates with low latency and high availability.
Referring to FIG. 1, typically, in a distributed computer system (20) a datum, for example, datum X (22) is stored by a primary server process (24) operating on one node of the system. In addition, a distributed computing system commonly includes one or more backup server processes, for example, sever process B (26) providing redundant storage of the datum and enabling communication of events to interested applications in the event of failure of the primary server process. The primary and backup sever processes are in communication with a plurality of other server processes operating on other system nodes and are aware of which server processes are serving an application process. Each of the server processes may be in communication with one or more application processes which may or may not utilize the datum to provide an output to a user. Each application process utilizing the datum has an interest in certain events affecting the datum, although the application processes may utilize the datum in different ways and, therefore, may be interested in different events affecting the datum. For example, a datum, the stock price of XYZ Corp., may be stored on primary server process and used by several remote application processes displaying stock prices to users and it may also be used by another process that calculates and displays the values of businesses included in an industry sector. Interest in an event may be expressed in terms of a regular expression, a list of data keys, a structured query language (SQL) statement or in some other way. If an application has an interest in an event, the application registers its interest in the event with its associated server process. For example, application process 4 (38) has registered an interest in event 003 (40) with sever process C (28).
When an event affecting a datum, for example, datum X, occurs in a distributed data management system, the data fabric communicates the event to the various server processes of the system for communication to the application processes that have expressed an interest in the event. An event may include a change in the value of the datum, deletion of the datum or creation of the datum. When an event (42) affecting datum X occurs it is communicated to the primary server process A providing storage for that datum and, if the primary server process is unavailable, the server process, for example, server process B (26), providing backup storage for the datum. The primary server process filters the event to determine if an application process in communication with process A has registered an interest in the event. In addition, primary server process communicates the event to the backup server process(es) and any other server processes which are serving an application process. When a server process receives the event it filters the event to determine which application process associated with the server process has registered an interest in the event. For example, the occurrence of event 001 affecting datum X is communicated to the primary server process A (24) where application process 1 (32) has registered an interest in the event (46). The primary server process filters the event and communicates the event with application process 1. The primary server process also transmits the event (45) to server process B because it is the backup server process and to server processes C and D because they are serving respective application processes. Server process C filters the event to determine that application process 4 (38) is not interested in the event and that application process 3 (36) has registered an interest in the event. Server process C will then communicate event 001 to application process 3. Likewise, server process D filters the event and determines that application process 5 (44) has no interest in the event. Similarly, the primary server process will filter events 002 and 003 and transmit the events to server processes B, C, and D. If the primary server process should fail, the backup server process would communicate events (48) in the same manner to any additional backup server processes and any other server process serving one or more application processes.
In a distributed computing system with large numbers of server processes and, typically, even larger numbers of application processes there are likely many server processes that are not in communication with an application process that has an interest in a particular event and communicating each event to each server process serving an application, without regard to whether the application process associated with the particular server process has an interest in that event, increases the communication burden of the data fabric and slows the distribution of events. Moreover, when an event is communicated to a server process, the server process must compute the interest of its associated applications to determine whether one of the applications with which it communicates has registered an interest in the event. Calculating interest in an event when none of the application processes associated with a particular server is interested in the event is wasteful of sever process resources, increases latency and lowers the throughput of the distributed computing system.
What is desired, therefore, is a method and apparatus for event routing in a distributed computing system with distributed data management that reduces duplication of system processes when an event occurs.