In a large-scale distributed system, traffic may be distributed across a cluster of servers by using a hash function to transform an identifier in the traffic (sometimes referred to as a ‘hash key’) to a hash value. The hash value serves as an index into a hash table and identifies a particular element (sometimes referred to as a ‘slot’ or a ‘bucket’) in the hash table, which identifies a given server in the cluster of servers with which the hash value is associated. By processing a traffic identifier in all received traffic in this manner, the traffic can be assigned to a particular server in the cluster that is associated with that hash value. By using an appropriate hash function that has a reasonably uniform output of hash values across the space of all possible outputs, traffic can be distributed more or less evenly across the servers. By providing such a reasonably uniform distribution, the load can be substantially balanced across all of the servers in the cluster.
However, in some situations, the servers in the cluster can change. This may be as a result of servers being removed from the cluster, for example as a result of planned maintenance or unplanned server failure. The change may also be as a result of new servers being added to the cluster to meet increased demand. With cloud computing techniques, such removals and additions may be made many times a day in response to fluctuations in traffic.
Following a change to the servers in the cluster, the set of all possible hash values is reallocated amongst the servers in the cluster so that the load can be shared across any new servers or so that the load can be redistributed from any removed servers. In other words, a hash value that was associated with a particular server may be associated with a different server following the reallocation. Since the traffic is distributed based on the hash values, traffic that was being routed to one server could be directed to a different server in the new cluster as a result of the reallocation. This may be the case even when both servers are neither removed from nor added to the plurality of servers.
In some systems, this is not particularly problematic. However, such reallocation is undesirable in many systems, particularly where a client device establishes a communications session with a server and the server stores information about the session. If the traffic from that client device is directed to another server, the other server may not have access to information relating to the session and the session may be interrupted while the other server retrieves the session data or the session may be dropped altogether. For example, when the traffic from a particular source is reallocated to a new server, any requests that were ‘in-flight’ to the original server around the time of the reallocation may be lost. Significant additional traffic may be required to re-establish the session on the new server. This additional traffic could significantly affect performance across the entire system when reallocation occurs.
Even for systems which have relatively short-lived sessions, a significant proportion of sessions may experience server reallocation and hence sub-optimal performance or user experience.
One known system provides techniques for load balancing in networks such as networks handling telephony applications. Requests associated with calls are directed to servers in a system comprised of a network routing calls between a plurality of callers and at least one receiver. A load balancer sends requests associated with calls to a plurality of servers. A request associated with a call, a caller, or a receiver is received, depending on the particular load balancing technique. A server is selected to receive the request. A subsequent request is received. A determination is made whether or not the subsequent request is associated with the call, the caller, or the receiver, depending on the particular load balancing technique. The subsequent request is sent to the server based on determining that the subsequent request is associated with the call, the caller, or the receiver, again depending on the particular load balancing technique.
This known system maintains mappings between calls and servers via one or more tables mapping call IDs to server IDs. A load balancer receives a request associated with a call. The table is consulted to determine if there already is a server associated with the call ID. If so, the request is routed to the server corresponding to the call ID. If not, the system determines an appropriate server to handle the request as well as subsequent ones corresponding to the same call.
Although this known system may provide some degree of session ‘affinity’ or ‘persistence’ to a particular server, the load balancer has to look into the table every time a request is received. This can create a significant processing overhead.
It would be desirable to provide improved methods and apparatus for processing traffic in a communications system.