1. Field of the Invention
The present invention relates to relieving processing load at hosts of server processes that provide services for real time communications in a network; and, in particular, to obtaining differentiated relief from different types or sources of requests for services at an overloaded server host, or both.
2. Description of the Related Art
Networks of general-purpose computer systems and other devices connected by external communication links are well known. The networks often include one or more network devices that facilitate the passage of information between the computer systems and devices. A network node is a network device or computer system connected by the communication links. As used herein, an end node is a network node that is configured to originate or terminate communications over the network. In contrast, an intermediate network node facilitates the passage of data between end nodes.
Information is exchanged between network nodes according to one or more of many well known, new or still developing protocols. In this context, a protocol consists of a set of rules defining how the nodes interact with each other based on information sent over the communication links. The protocols are effective at different layers of operation within each node, from generating and receiving physical signals of various types, to selecting a link for transferring those signals, to the format of information indicated by those signals, to identifying which software application executing on a computer system sends or receives the information. The conceptually different layers of protocols for exchanging information over a network are described in the Open Systems Interconnection (OSI) Reference Model. The OSI Reference Model is generally described in more detail in Section 1.1 of the reference book entitled Interconnections Second Edition, by Radia Perlman, published September 1999, which is hereby incorporated by reference as though fully set forth herein.
Communications between nodes on a packet-switched network are typically effected by exchanging discrete packets of data. Each packet typically comprises 1] header information associated with a particular protocol, and 2] payload information that follows the header information and contains information that may be processed independently of that particular protocol. In some protocols, the packet includes 3] trailer information following the payload and indicating the end of the payload information. The header includes information such as the source of the packet, its destination, the length of the payload, and other properties used by the protocol. Often, the data in the payload for the particular protocol includes a header and payload for a different protocol associated with a different, typically higher layer of the OSI Reference Model. The header for a particular protocol typically indicates a type for the next protocol contained in its payload. The higher layer protocol is said to be encapsulated in the lower layer protocol. The headers included in a packet traversing multiple heterogeneous networks, such as the Internet, typically include a physical (layer 1) header, a data-link (layer 2) header, an internetwork (layer 3) header, and some combination of a transport (layer 4) header, a session (layer 5) header, a presentation (layer 6) header and an application (layer 7) header as defined by the Open Systems Interconnection (OSI) Reference Model. In networking parlance, a tunnel for data is simply a protocol that encapsulates that data.
The popularity and good performance of packet switched networks has led to the expanded use of such networks for real-time communications to support such applications as telephony, multimedia conferencing, gaming, and other real-time shared data applications. Such applications involve the setting up of sessions (often termed “calls” in reference to traditional telephony) between two or more users of end nodes, and terminating those sessions when the communications end. To initiate (or “set up”), maintain and terminate (or “tear down”) those sessions, one or more intermediate network nodes send signaling data to end nodes and other intermediate nodes.
The Session Initiation Protocol (SIP) is the Internet Engineering Task Force's (IETF's) standard layer 5 protocol for multimedia conferencing over the Internet Protocol (IP), a widely used layer 3 protocol. SIP is a character-based, application-layer control protocol that can be used to establish, maintain, and terminate calls between two or more end points. IETF publishes an adopted standard for Internet technologies as a Request for Comments (RFC), available on the public Internet at the IETF domain ietf in the class org in the directory named rfc. SIP is described in RFC 3261 in that directory in a file named rfc3261.txt, the entire contents of which are hereby incorporated by references as if fully set forth herein. SIP is designed to address the functions of signaling and session management within a packet switched telephony network. Signaling allows call information to be carried across network boundaries. Session management provides the ability to control the attributes of an end node-to-end node call.
SIP provides the capability to determine the location of the target end point—SIP supports address resolution, name mapping, and call redirection; and SIP provides the capability to determine the media capabilities of the target end point—via Session Description Protocol (SDP). SIP determines the “lowest level” of common services between the end points. Conferences are established using only the media capabilities that can be supported by all end points. SIP provides the capability to determine the availability of the target end point. SIP also supports mid-call changes, such as the addition of another end point to the conference or the changing of a media characteristic or codec. SIP supports the transfer of calls from one end point to another and terminates the session between the transferee and the transferring party. At the end of a call, SIP terminates the sessions between all parties. SIP supports setting up conference calls of two or more users, which can be established using multicast or multiple unicast sessions.
SIP-based networks provide many call services, such as call session control, proxies for call session control, user location, interactive voice response (IVR), authentication, encryption, compression, translation, call forwarding and gateways to other networks, among others. Call services also include International Telecommunication Union-Telecommunications Standardization Sector (ITU-T) defined Integrated Services Digital Network (ISDN) Supplementary Services, such as Call Forwarding (CF) Busy, CF-No Answer, CF-Unconditional, Automatic Call Back, Automatic Recall, Call Waiting (CW), Call Transfer, 3-way calling, and Conference Calling, among others. Many of the call services that SIP supports are provided by servers. The client-server model of computer process interaction is widely known and used in commerce.
According to the client-server model, a client process sends a message including a request to a server process, and the server process responds by providing a service. The server process usually returns a message with a response to the client process. Often the client process and server process execute on different devices, called hosts, and communicate via a network using one or more protocols for network communications. The term “server” is conventionally used to refer to the process that provides the service, or the host computer on which the process operates. Similarly, the term “client” is conventionally used to refer to the process that makes the request, or the host computer on which the process operates. As used herein, the terms “client” and “server” refer to the processes, rather than the host computers, unless otherwise clear from the context. In addition, the process performed by a server can be broken up to run as multiple servers on multiple hosts (sometimes called tiers) for reasons that include reliability, scalability, and redundancy.
In a peer-to peer (P2P) model, the two processes are equals and symmetric, that is, either side may play the client or server. SIP uses the P2P model for the end nodes, at least. The combined client/server is called a User Agent (UA). Thus a UA may either initiate or respond to calls. The primary intermediate node called a proxy also plays both client and server roles and acts as a peer to a UA on an end node.
As the number of calls handled by a network increases, the signaling to initiate, maintain and terminate those calls increases. Requests for call services to support those calls also increase at the nodes that host severs for those services. Such nodes tend to aggregate signaling traffic. At some level of use, one or more of the server hosts become so fully loaded with processing requests for services, that they can not accept additional requests. At the point of responding to no more requests, the host and server fails to provide the service to at least some calls. Some such failures can lead to failure of the network, and result in many dissatisfied and even angry subscribers. For example, as a server host fully exerts it computational resources of central processing unit (CPU) cycles and memory to handle existing requests, the number of call completions is observed to decline. Clients that don't get responses back in timely manner will retransmit the request. A proxy can find itself increasing its computational resources for rejecting requests, while decreasing its computational resources for getting real work done, such as completing call setups. A typical progression evolves from using all expended computational resources on completing calls, to using a percent less than 100% to completing calls and the remainder to process retransmissions and rejecting new calls, to using all its computational resources on processing retransmissions and rejecting new calls, to crashing and performing no call processing.
The goal is to optimize the servicing of call service requests (“goodput”) rather than optimizing throughput that includes processing retransmissions and rejections. Commercial devices complete 100% of calls at the design capacity, and complete 95% of design capacity even when overloaded at 150% of designed capacity. For example, if a node can handle 100 calls per second (CPS) at design capacity, then as the load rises to 150 offered CPS, the node completes 95 CPS, and rejects 55 CPS. Five percent of its capacity is devoted to rejecting those 55 CPS.
In one approach, a host for a heavily loaded server (at some percentage over its designed 100% completion capacity) is programmed to reject all service requests. While this is suitable for some purposes, it often does not cure the problem. For example, rejecting SIP INVITE messages from calls in progress causes a client node making the request to retransmit the request. The server still expends host resources in dealing with the retransmissions. Furthermore, this approach can cause the host to fail (“crash”) as the server starts to spend more central processing unit (CPU) cycles and memory on detecting and rejecting the additional service requests.
In another approach, a heavily loaded server host is programmed to reject service requests for new calls, but to accept requests for calls in progress. While suitable for some circumstances, this approach suffers some disadvantages. A significant disadvantage is that the heavily loaded host must expend computational resources to examine and distinguish a new call setup message (e.g., a SIP INVITE message) from a mid-call modification message (e.g., also a SIP INVITE message). Furthermore, this approach can cause the host to fail (“crash”) as the server starts to spend more computational resources on detecting and distinguishing among types of call requests.
In one approach, multiple nodes are included in the network to host the same server, and a heavily loaded host refuses more call requests in the expectation that the other host can absorb the extra work until the host reduces its load. For example a SIP 503 response message with a RETRY AFTER header (field) is sent to a neighboring network node that sends the offending request. This message is a signal to stop sending all further SIP messages (no matter the type of SIP message) to the host until a time indicated by data following the “RETRY AFTER” characters in the RETRY AFTER field. While delaying the onset of system failure, this approach does suffer some deficiencies. For example, if two equivalent hosts are each at 90% capacity, the refusal of all requests by one server might result in all traffic for that service being directed to the other server host. The new traffic quickly fully loads the other server host, and the other host also refuses new requests. This can generate a cascade of failing server hosts and a failure of the system to provide the call service and the premature termination or refusal of many calls on the network. Again, dissatisfied and even angry subscribers are generated.
Another disadvantage of this approach is that some services are hosted by nodes of the network that are connected to a large number of other nodes, such as a provider edge node that aggregates calls from a large number of SIP phones. When one SIP phone sends a request that causes that host to become heavily loaded, the SIP 503 RETRY AFTER message is sent to the one phone. The host is still inundated with traffic from the many remaining SIP phones that are unaware of the load burdening the edge node. Thus the message is ineffective because of the large fan-out of devices generating the traffic. Another SIP 503 RETRY AFTER message must be sent to each new SIP phone making a call.
Based on the foregoing, there is a clear need for techniques to respond to heavy computational load conditions at a call server host, which do not take as many computational resources on the heavily loaded host as prior art approaches.
There is also a clear need for techniques that do not stop all traffic to a server host, which can lead to cascading failures, but can determine a portion of the traffic to delay or divert.
There is also a clear need for techniques that easily notify all network nodes in a fan-out, which are sending traffic that heavily loads the receiving server host, without adding more work to the heavily loaded server host.