1. Field of the Invention
This invention relates generally to the field of data processing systems. More particularly, the invention relates to an improved system and method for reliable, distributed communication with guaranteed service levels.
2. Description of the Related Art
Internet users today may communicate using a variety of different client applications including real-time messaging applications (e.g., instant messaging or “chat” applications) and voice/video applications. Some client messaging applications such as the Skype™ client designed by the assignee of the present application provide integrated chat and voice/video in a single software platform. While the discussion below will focus on chat messaging, the underlying principles of the invention may be implemented using any type of messaging/communication technology.
Chat messages can be delivered via several mechanisms: (1) A single centralized server (or single cluster of servers); (2) A fully distributed P2P system, where each client locates the other users' endpoints in the conversation and sends the messages directly to them; and (3) A system of cooperating servers that relay messages to each other. System (3) may resemble the USENET approach where each server maintains a list of the messages it has and at periodic intervals exchanges messages with other servers it is connected to. Approaches (2) and (3) above are similar, except that in approach (3), the clients retrieve messages from a single server, which relays messages, whereas in approach (2) the client is connected to multiple other clients and is responsible for relaying messages itself.
While each of these approaches is capable of meeting the requirements of providing a shared chat experience, each has drawbacks that make it difficult or impossible to meet the Service Level Agreement (SLA) requirements of demanding customers. In particular:
A centralized chat system can be made scalable, but it imposes a significant latency on many customers, as communication latency is limited by the speed of data transfer. For example, if the central system is located in the USA and two chat participants are in Australia, simple data transfer latency will add 500 ms to the transaction. Furthermore, a centralized system does not allow a provider to provide guaranteed service levels to particular customers while maintaining a general-purpose system for other customers. In particular, for the “freemium” business model, providers frequently wish to support large numbers of users with free services while providing better quality service to paying customers. Designing the centralized system to support premium-quality services without any impact from the free customers is very difficult, if not impossible due to surges in load.
(2) While a P2P system does not suffer from the central load management problems of (1), a provider wishing to provide guaranteed services, and in particular wishing to provide suitable auditing of such guarantees, will face difficulties in meeting these requirements. P2P software is notoriously difficult for dealing with offline users (two peers cannot exchange messages except when both are online simultaneously) and for difficulties exchanging messages due to connectivity (e.g., NAT and firewall) problems. Furthermore, documenting that SLAs were met in a distributed P2P system is very difficult, as a provider (and customer) typically desire more deterministic logging for demonstrating that the SLA has been met.
(3) A system where servers periodically relay messages to each other is generally incapable of meeting high-performance SLAs because customers connected to different servers will experience high, indeterminate latency for message delivery.
A conversation system must also provide reliability in the case of node or server failure. This is conventionally achieved using a disk-backed database in conjunction with replication. In the central (1) solution, a single database, perhaps with local replication, can easily meet this need. In (2) and (3), there is no single master record of conversations, but instead each node maintains its own history and compares its local history with other nodes to see if it is missing events. Such a scheme is typically reliable, but can be slow to synchronize for performance reasons and would typically use a horizon, such as number of days or months, beyond which it does not exchange information about the number of messages known.
Conventional databases are single-master—only one node can be the writer at a time. This is why options (1), (2), and (3) almost always consist of one or more databases, one per node, with a separate scheme to copy messages between the nodes (and thus between the databases). In an alternative approach, the database itself is distributed and handles synchronization of the messages between nodes. This form of distributed database often uses “vector-clocks” to maintain consistency between data. In essence, each participant in a chat might have a counter (the clock) attached to their chats. Each time the participant adds a new message, the clock is incremented. Nodes can determine that they have all messages in a chat by comparing the set of clocks with other nodes, without needing to compare all of the messages.
The Dynamo storage system (designed by Amazon™) and Riak™ NoSQL database are examples of storage systems based on vector clocks. Both provide techniques for reliably updating a single distributed database from multiple locations without requiring a single writer. In essence, these approaches are of type (3), with the exchange between nodes being done at the database level rather than at the messaging application level. While such an approach makes the messaging system highly reliable, it does not address the challenges of making (3) meet the high performance SLAs desired.