The communications industry is rapidly changing to adjust to emerging technologies and ever increasing customer demand. This customer demand for new applications and increased performance of existing applications is driving communications network and system providers to employ networks and systems having greater speed and capacity (e.g., greater bandwidth). In trying to achieve these goals, a common approach taken by many communications providers is to use packet switching technology. Increasingly, public and private communications networks are being built and expanded using various packet technologies, such as Internet Protocol (IP).
In loosely-coupled, asynchronous, distributed systems, there are scenarios which require the simultaneous scheduling of associated processes running on different nodes of the networked system. Because the nodes are loosely coupled, this represents a challenge as the nodes are not synchronized and run independent schedulers (i.e., the overall system is asynchronous). In such a system, there is no centralized operating system nor master scheduler presence.
This presents a problem if there are system-wide events that require the simultaneous cooperation of processes on different asynchronous nodes. Because the nodes are asynchronous, it is not inherently possible to simultaneously schedule processes on different nodes. Therefore, there needs to be a scheduling component in the system that is able to guarantee that cooperating processes (i.e., a gang) are simultaneously dispatched on different nodes in a timely manner.
An example of a gang is a set of threads, processes, or activities that execute at the same time on different processors, typically in different systems. In gang scheduling, all the threads of gang are scheduled to execute at the same time. Gang scheduling among asynchronous systems is a difficult problem.
Known implementations for gang scheduling face numerous problems. For example, a message which affects the scheduling of the gang (i.e., a message that dispatches or activates the gang members) typically must be delivered reliably to the nodes containing gang members and acted upon at the same time. Because each node is running an independent scheduler (e.g., asynchronous system), there is an unavoidable indeterminism in when/how the scheduling message is processed on different nodes. This is an impediment to the goal of dispatching processes “simultaneously” across different nodes and emphasizes the act that there is no such notion of simultaneous in such systems. Also, if the scheduling message cannot be delivered to all the nodes of the gang at the same time, then the scheduling skew of the scheduling message is increased by the delays in message arrival at the different nodes.
Communication between gang members and/or members of other sets of members is problematic. Multicast messages can be used to communicated information among such members. However, these schemes typically are not reliable as they multicast the message without any acknowledgements from the receivers, or all members must acknowledge each and every message which is not efficient nor scalable.