The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Many of today's large scale computing systems comprise a plurality of computing nodes and a request handler. The request handler receives processing requests from external entities (such as client computers), determines which services are being requested by the requests, determines which computing nodes provide the requested services, and then routes the requests to the appropriate computing nodes for processing. In performing this request routing function, the request handler may also perform other functions, such as load balancing to ensure that the load is spread evenly across the various computing nodes.
A request handler often maintains a list of computing nodes. This list includes all of the nodes that have been registered with the request handler; hence, this list represents the nodes that the request handler is aware of. The list of nodes may include active and passive nodes. An active node is a node that the request handler considers to be actively participating in the processing of requests; thus, an active node is a node to which the request handler may forward service requests. A passive node is a node that the request handler considers to not be actively participating in the processing of requests. A passive node may, for example, be a node that is currently unhealthy or is partially or completely malfunctioning. The request handler will not forward requests to a passive node. During operation, an active node may become a passive node, and a passive node may become an active node. For example, if an active node malfunctions or becomes unhealthy, it may be changed to a passive node. Conversely, if a previously malfunctioning or unhealthy node becomes healthy, it may be changed to an active node. Thus, the status of a node may change.
In addition to the nodes that the request handler is aware of, a large scale computing system may further comprise additional nodes. Additional nodes may be included in the large scale computing system for a number of reasons. For example, the additional nodes may serve as rollback nodes. That is, the nodes that the request handler is currently aware of and currently forwarding requests to may run a current version of software while the rollback nodes may run a previous version of the software. Should problems be experienced with the current version of the software, the rollback nodes may be substituted for the currently used nodes to “rollback” to the previous version of the software. This rollback may be achieved, for example, by replacing the list of nodes currently used by the request handler with a new list of nodes that includes the rollback nodes.
Another reason additional nodes may be included in a large scale system is to prepare to launch a new version of software. For example, a new version of software may be installed and executed on a set of upgrade nodes. The software may be configured, tested, etc., on the upgrade nodes until it is ready to go live. At that point, the upgrade nodes may be substituted for the nodes currently used by the request handler. This may be achieved, for example, by replacing the list of nodes currently used by the request handler with a new list of nodes that includes the upgrade nodes. For these and other reasons, a large scale system may include nodes in addition to the nodes that the request handler is currently aware of. Since the additional nodes are not actively being used by the request handler to process requests, they are considered to be passive nodes. Thus, a passive node may be a node that the request handler is aware of, or an additional node that the request handler is not aware of.
A passive node may have a plurality of processes executing thereon. These processes may be of various types, including a request-processing type and a self-initiated type. A request-processing type of process is one that is invoked when a request is received from the request handler. Since the request handler will not forward requests to a passive node, this type of process on a passive node will most likely not perform any processing. A self-initiated type of process is one that performs processing even when no request is received from the request handler. Examples of this type of process include, for example, a process that wakes up periodically to perform some processing, a process that periodically polls a message queue for messages and processes those messages, etc. This type of process may perform processing even if it is running on a passive node and even when the passive node is not receiving any requests from the request handler.
Generally, a process on a passive node should not perform any processing that may affect the operation of the active nodes or the transactions being processed by the active nodes as that may lead to system inconsistency and corruption. For example, as noted above, a passive node may be running a different version of software than the active nodes that are currently being used by the request handler. Also, a passive node may be unhealthy or malfunctioning. Furthermore, a passive node may not be fully and properly configured. That being the case, if a process on a passive node performs processing that affects the active nodes or the transactions being processed by the active nodes, it may very well lead to incorrect or inconsistent results, or even worse, to system corruption or failure.
As noted above, a self-initiated type of process may perform processing even if it is running on a passive node and even when the passive node is not receiving any requests from the request handler. As result, this type of process (and perhaps other types of process) on a passive node may give rise to adverse and potentially grave consequences. Hence, a mechanism is needed to control the operation of this type (and perhaps other types) of process.