Clusters and datacenters play an increasingly important role in the contemporary computing spectrum, providing back-end computing and storage for a wide range of applications. The modern datacenter is typically composed of hundreds to thousands of inexpensive commodity blade-servers, networked via fast, dedicated interconnects. The software stack running on a single blade-server is a combination of off-the-shelf software: commercial operating systems, proprietary middleware, managed run-time environments and virtual machines, all standardized to reduce complexity and mitigate maintenance costs. For many purposes, rapid response to events is critical. Computer programs with this property are commonly referred to as time critical computing applications. A time-critical computing application is said to be scalable if it is able to execute on clusters of commodity servers, so that when more load rises, additional computing nodes can easily be assigned to the application. Applications in domains ranging from computational finance to air-traffic control and military communication are under growing pressure to migrate from traditional single-node computer platforms to commodity clusters to take advantage of scalability. However, when requirements of timely responsiveness, massive scalability and “multiple nines” of availability are combined, they result in extremely complex application designs. What is needed is a time-critical communications paradigm that greatly simplifies the development of these scalable, fault-tolerant, time-critical applications to produce time-critical applications that can scale to hundreds of nodes, can support fault-tolerance and high availability, and can exploit modern “distributed object architectures” that provide programmers with easy-to-use abstractions.
What is needed is a multicast protocol based on a realistic datacenter loss model reflecting significant frequency of short bursts of packets being dropped at the end-host receivers, with specific loss rates that can be measured for a given target cluster. The protocol must also make use of the fact that the critical dimension of scalability in time-critical fault-tolerant settings is the number of groups in the system. The resulting reliable multicast protocol should also be designed to perform well even when each node belongs to a great many low-rate multicast groups. Finally, the protocol should be one that can be at the application-level, requiring no router modification or operating system changes, so that applications can run on standard datacenter hardware and execute on any mix of existing commodity routers and operating systems software. Given a protocol that has these properties, a further goal is to achieve packet loss recovery latency dependent on the rate of data incoming at a node across all groups, i.e. recovery of packets should occur as quickly in many groups as in a single group, allowing applications to divide node bandwidth among many multicast groups while maintaining time-critical packet recovery.