The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
The development of residential broadband network access lines and equipment has caused massive deployments of Internet protocol (IP) based devices in homes and offices. These devices support services such as high speed data, routing, switching, voice, video and gaming, and other services are planned. In certain circumstances, these devices can and will operate autonomously, as clients with various network servers. This interaction takes place, among other reasons, to allow for initial configuration of a device or application, for control, and for IP-based service registration. Interactions of client devices with network servers include Presence, Call, File Transfer Protocol (FTP), Kerberos, Domain Name System (DNS), Provisioning, Dynamic Host Control Protocol (DHCP), etc. Cable modem deployments are known in which 300,000,000 devices are serviced through a multi-step interaction or provisioning flow in one hour, or about 83,000 devices serviced per second.
Many of these client-server interactions conform to protocols that are ordered in the same way. For example, several protocols involve an ordered, step by step interchange of request/response pairs communicated between single clients and multiple servers. At each point in this multi-step interaction, clients wait for a specified time limit. If a response has not been received, the clients retry the request. After a specified number of retries, the clients will reinitialize the interaction to a previous step, often starting over at the first step. The servers typically queue incoming requests and provide service to each request with some latency. A protocol that operates as described herein may be termed a “Multi Step Retry Reinitialization Protocol Flow.”
One characteristic of some massive device deployments is that large groups of devices depend on mutual resources. Examples of these shared resources include the electrical grid, shared communications nodes, and electromagnetic spectrum. This mutual dependency on shared resources causes the IP devices, under certain circumstances, to lose their operational and behavioral independence. The loss of independence can happen at unfavorable times, such as in a disaster.
A consequence of such a lack of behavioral independence can be an “avalanche” of initial messages of Multi-Step Retry Reinitialization Protocol Flows into the service infrastructure as initiated by the client devices. The avalanche occurs as each device, in a group of thousands or millions of devices, seeks to recover from a disaster by reestablishing itself in the network and by reattaching to each of the services provided by that network. For example, assume that under normal conditions a group of two million cable modem devices receives IP network address service from one DHCP server. After a natural disaster such as severe weather or earthquake results in all devices losing network connectivity, all the devices will attempt to obtain network addresses from the DHCP server at approximately the same time. The resulting avalanche of DHCP service requests can rapidly overwhelm the DHCP server, which otherwise can readily handle typical requests under normal operating conditions.
In some protocols, the arrival rates of client requests vastly exceed service rates. Further, individual servers within the flow can have highly mismatched throughput characteristics. As a result, client devices will generate large amounts of retry messages and re-initialization messages attempting to obtain service. As a consequence, a positive feedback loop is created, further increasing arrival rates and worsening the problem. The condition creates a situation in which few if any client devices receive service, even while associated network traffic and server work significantly increase. More loading and more traffic occur, but nothing is accomplished as clients cannot get back on-line.
Presently known solutions to these problems either do not control or restrict access to the full flow, in which case they fail under avalanche, or they indiscriminately limit requests or traffic by packet type and rate. As a result, past approaches typically provide for only limited client retry or re-initialization control, or fail completely or exacerbate the problem.