The invention pertains generally to computer servers. More specifically, the invention relates to managing network congestion and server load in a system having a plurality of client devices such as a hospitality media system.
In a typical hotel media system, set-top boxes (STBs) are installed in guest rooms and are in communication with a central server. The communication is usually performed over an Internet protocol (IP) network available in the hotel, and part of a boot-up process at each STB typically involves requesting configuration information from the server. A problem with such a configuration is that the central server may become overloaded with simultaneous requests from many STBs. One example of a particularly problematic event is when power is restored in the hotel after an unexpected outage. In this situation, STBs in each room will boot-up at approximately the same time, and, when the hotel includes hundreds or even thousands of STBs, the resulting sudden surge in network requests and server load can be overwhelming. The entire system may be exponentially delayed as a result.
A common solution to this problem is to introduce a random delay at each STB before attempting to request information from the server. Random delays tend to spread out the requests from STBs and allow the server more time to process each request. In the event that the server still becomes overloaded and unable to service all the requests, affected STBs may wait another random delay as a back-off delay before attempting the request again. The back-off delays may exponentially increase in duration to further spread out the requests and allow the server more time to recover.
Although this solution works well in a system with a relatively small number of STBs, it does not scale to hotels having thousands of STBs. One reason is that the random numbers generated by the STBs may not actually be random and therefore a large number of STBs may “randomly” chose exactly the same delay. Even with true random delays, there is still a chance that hundreds of STBs will all choose the same (or similar) random numbers. When this happens, the server unfortunately remains overloaded and will need to reject some requests. These rejected requests will be further spread out by increasing the upper-bound on the back-off time delay at each STB. In a large hotel, the upper-bound of the random delay needs to quickly grow in order to sufficiently space out the requests in the event that all STBs are requesting data at the same time. Minutes of back-off delay may be encountered in the event that thousands of STBs are all rebooting at the same time. However, in the event that there is actually no long-term server load, it is very undesirable to have exponentially increasing delays. For example, when only a single STB is rebooted but then happens to be unlucky on its first few attempts at requesting configuration data from the server, the STB may randomly choose a very long back-off delay. In the event that the server was only momentarily busy, the long back-off delay at the single STB is unnecessary and a waste of user's time.