1. Field of the Invention
This invention relates to computer networking and, more particularly, to methods for distributing rate limits and tracking rate consumption across members of a network cluster having a plurality of cluster members.
2. Description of the Related Art
The following descriptions and examples are given as background only.
In the context of computer networking, the term “availability” usually refers to a network node's ability to respond to requests no matter what the circumstances. For example, a continuously available node may be characterized as having essentially no downtime within a given timeframe (e.g., one year). However, since most network nodes experience at least some amount of downtime, they are typically characterized as having a certain level of availability. For instance, a “high availability” node may be described as having approximately 99.9% uptime, which may translate into a few hours of planned or planned downtime per year.
More specifically, the term “high availability” usually refers to the maintenance of high levels of access to network processes and associated data without comprising quality user experience. Network processes that benefit from or strive to maintain high availability include, but are not limited to, administrative processes, firewall processes, load balancing processes, operating system processes and various types of server processes (e.g., HTTP server, application server and database server processes). High availability may also be desired for various types of network data, such as application data used by application servers, persistent session data, security data and transaction log files, among others.
Clustering is one approach for accomplishing high availability. In many cases, a “cluster” may be described as two or more machines (referred to as “cluster members”), which are coupled together across a local high-speed network (i.e., a high-speed Local Area Network, or LAN). Cluster members may be connected to the LAN via any network topology (e.g., via a bus, star, ring, or mesh configuration). Although not typically the case, cluster members residing at different geographical locations may be coupled across a Wide Area Network, or WAN (one example of which is the Internet). A generic depiction of a network cluster 100 coupled to a LAN 110 is illustrated in FIG. 1. The dotted line in FIG. 1 denotes the possibility of cluster members being coupled across a WAN 120.
High availability clusters improve the availability of services by providing redundant nodes, each configured for running a common application(s). This configuration enables the nodes (i.e., cluster members) to share the workload and assume additional load, should one of the nodes fail. High availability clusters are commonly used to implement key databases, file sharing on a network, business applications and consumer services, such as electronic commerce (e-commerce) websites.
In some cases, multiple cluster members may be defined on the same physical machine (i.e., vertically scaled clusters) to allocate the processing power available to that machine in a more efficient manner. In other cases, cluster members may be created across multiple machines (i.e., horizontally scaled clusters). The latter enables a single application to run on several different machines, while presenting a single system image. This allows client requests, which would otherwise overwhelm a single machine, to be distributed across several different machines. In some cases, a combination of vertical and horizontal scaling may be used when creating a cluster to reap the benefits of both techniques.
The term “network traffic control” typically refers to the process of managing, prioritizing, controlling or reducing network traffic to reduce congestion, latency and packet loss. In addition to other features, network traffic control includes bandwidth management and admission control procedures.
“Bandwidth management” is usually described as the process of measuring and controlling the amount of traffic on a network link to: i) avoid filling the link to capacity, or ii) overfilling the link, which would result in network congestion and poor performance. Two common bandwidth management techniques include rate limiting and traffic shaping.
“Rate limiting” controls the rate at which traffic is sent or received on a network interface. Traffic that is less than or equal to the specified rate is sent, whereas traffic that exceeds the rate is dropped or delayed. Rate limiting is typically performed by policing (i.e., discarding excess packets), queuing (i.e., delaying packets in transit) or controlling congestion (i.e., manipulating the protocol's congestion mechanism). A device that performs rate limiting is referred to as a “rate limiter.”
“Traffic shaping” is often described as an attempt to control network traffic in order to optimize or guarantee performance, low latency and/or bandwidth. Traffic shaping algorithms usually deal with concepts of classification, queue disciplines, policy enforcement, congestion management, quality of service (QoS) and fairness. The most common traffic shaping algorithms are the Token Bucket and Leaky Bucket algorithms.
The Token Bucket algorithm dictates when traffic can be transmitted based on the presence of “tokens” in the bucket. For example, a “token bucket” may contain at most b tokens (usually representing a particular number of bytes). A “token” is added to the bucket every 1/r seconds (referred to as the token regeneration rate). If the bucket is full when the token arrives, it is discarded. When a packet of n bytes arrives, n tokens are removed from the bucket and the packet is sent to the network. However, if fewer than n tokens are available, no tokens are removed from the bucket and the packet is considered to be non-conformant. Non-conformant packets may be: i) dropped, ii) queued for subsequent transmission when sufficient tokens have accumulated in the bucket, or iii) transmitted but marked as non-conformant, so that they can be subsequently dropped if the network becomes overloaded. The Token Bucket algorithm, therefore, controls the amount of data that is injected into a network by imposing a limit on the average data transmission rate. In other words, the Token Bucket algorithm allows “bursts” of data to be sent (up to its peak burst rate) if there are adequate tokens in the bucket and the burst threshold is configured properly.
The Leaky Bucket algorithm differs from the Token Bucket by imposing a hard limit on the data transmission rate. For example, imagine that incoming packets are placed into a bucket with a “hole” in the bottom. As before, the bucket may hold up to b bytes. If a packet arrives when the bucket is full, it is discarded. Unlike the Token Bucket, packets are allowed to filter out of the “leaky” bucket at a constant rate of l bytes per second. Such filtering imposes a hard limit on the data transmission rate (by enforcing space between packets) and produces the effect of smoothing out bursty data.
In general, the term “admission control” refers to the ability to monitor, control and enforce the use of network resources and services based on certain criteria. For example, in networks that strive to provide Quality of Service (QoS), admission control procedures may be used to accept or reject user sessions (or individual flows) based on various priority settings, policies and/or available bandwidth. Service Level Agreements (SLAs) represent one manner in which admission control concepts may be enforced, for example, by service and by requester (i.e., user or client) to provide the requester with guaranteed levels of service (e.g., specific guarantees on uptime, latency, restoral time per failure, packet loss, etc.). Other types of admission control exist.
Most approaches to admission control provide rate limit enforcement at the packet or transport level request. For example, a common admission control algorithm is to imagine a bucket with a limit imposed on the rate of messages entering a protected network node. A token is added to the bucket each time a new message is processed. The bucket contents are cleared after each interval (e.g., every second) to provide rate limiting without enforcement of space between messages. This sliding window method is often referred to as a “rate limiter bucket,” and is only used during periods of active traffic.
Although appropriate for some networks, conventional bandwidth management and admission control procedures are not well-suited to networks that include clusters. Therefore, a need remains for improved procedures that can be used to protect network resources, services and applications running in a clustered environment.