Media servers deliver media content (video or audio) over digital data links to end-user devices that play the media, such as smartphones, tablets, laptops, PCs, and TV sets. A recurring issue arises when many end-users request content from one of the media servers, and this particular media server becomes a bottleneck for the data being sent to the user devices. For example, a caching media server can be connected to Internet at a bandwidth of 10 Gbps. Assume the server stores video assets encoded at a bitrate of 5 Mbps and has a transmission overhead of 10%. In this example, at most 1800 user devices can be served by this media server at a time (5 Mbps×1800 is 9 Gbps is 90% of 10 Gbps). If more devices requested the media assets, delivery bitrate would have to drop below 5 Mbps due to the bandwidth saturation at the sending side, such that all the users might face inappropriate playback quality, even though their own access to the Internet may be at higher speeds—much greater than the encoding rate.
The problem is further complicated by the fact that data for all user devices are not transmitted continuously. Many adaptive bitrate streaming technologies (including of Apple HLS®, Adobe HDS®, and Microsoft Smooth®) package content in multiple chunks that are requested irregularly. Also, the end user may not request the data continuously. Instead, the data transmissions are subject to the users' behavior, which may include pausing, fast-forwarding and rewinding the media at various times. Still further, accounting for user devices is complicated since with stateful protocols, such as TCP, a single user can use many connections.
One known mechanism to address these problems is to compare the actual bandwidth occupied on a network interface to the maximal bandwidth the interface was capable of. If the difference was less than some threshold the interface was deemed to be saturated. However, a handful of user devices with fast network connectivity can easily saturate any network interface this way, resulting in severe underuse of an operator's infrastructure. Another known technique is to monitor the number of established TCP connections at the server. If the measured number was greater than a threshold the server was deemed to be saturated. As explained above, however, connections alone cannot reliably indicate number of associated user devices, nor do those connections carry data continuously. This technique can, therefore, lead to drops in quality of service (QoS) as perceived by the user or underuse of the operator's infrastructure. Another known technique is to identify a user by a unique token within a URI when the content was requested over the HTTP(S) protocol. If the current number of such users exceeded a certain threshold the server was deemed to be saturated. Again, no user activity is taken into account with this technique. Besides, complexity of ongoing cross-protocol analysis requires some overheads. This technique complicates system architecture and does not offer a guarantee of sufficient bandwidth at the media server.