With the modern internet, hypertext transfer protocol (HTTP) based media transferring is practically ubiquitous. Content providers typically deploy their media delivery service on top of the internet infrastructure. For example, content providers like Netflix do not deploy their own streaming architecture and can use the internet infrastructure as is. With the internet's infrastructure of caches, content distribution networks (CDN) and proxies, HTTP was designed for best effort file delivery, rather than real time media delivery. Conventional streaming protocols such as real time transport protocol (RTP) do not typically exploit this infrastructure, which, unlike HTTP, may also be constrained by Network Address Translation (NAT) complexities and firewall traversal requirements.
Being stateful (server keeps track of which segments have already been downloaded), RTP performs a push function where a server drives file transfers. In RTP, a server must track status of a client device in order to send data at correct times. In contrast, being stateless (server does not track which segments have already been downloaded), HTTP performs a pulling function where a client device drives file transfers by requesting segments as needed. In HTTP a client device sends a request to a server, upon receipt of which the server sends data, obviating need for the server to track client device status. This allows an HTTP transferring server to remain unaware of sessions, which reduces the load on the server and provides ease of content distribution. Dynamic handling of fluctuating bandwidth can be difficult in RTP streaming without SVC or another scalable codec (see reference [5], incorporated by reference herein in its entirety). However, HTTP multimedia transferring adds a significant overhead to a transferring session compared to RTP (see reference [6], incorporated by reference herein in its entirety).
Moreover, conventional HTTP approaches do not actually represent real streaming. Instead, conventional HTTP “streaming” comprises progressive downloading, i.e., downloading, combined with immediate playback. While simple and deployable, progressive HTTP downloading does not manage fluctuating bandwidth issues well. Dynamic Adaptive Streaming over HTTP (DASH) was developed to address bandwidth fluctuation in progressive downloading.
DASH allows HTTP to bypass firewalls and NAT and its dynamic function handle varying bitrates. Essentially, DASH cuts media content into independently decodable segments. This allows encoding the media content at different qualities or resolutions, while dividing the media content into segments of equal length. Client devices use HTTP to access the media content and select the segments that most effectively fulfill client devices' new bandwidth or resolution demands. DASH typically uses a manifest file (MF) that provides a description of the media content and can be, for instance, extensible markup language (XML) based. On request from a client device to a server, the manifest file can be provided from the server to the client device to initiate a session. The client device can parse the manifest file and request individual media content segments according to information found in the manifest file.
DASH system adaptation method (also referred to as adaptation logic) is generally located at the client side, which leverages client devices' awareness of their capabilities and bandwidth requirements. It can be assumed that DASH may become widely deployed over the internet and mobile networks in the next few years. Mobile networks are proliferating rapidly, and video transferring is expected to comprise most traffic thereon over the next few years. However, neither HTTP nor the proxies that the protocol exploits to cache previously selected content (for bandwidth and cost conservation) are designed optimally for real time streaming.
It may be assumed that virtually every HTTP connection uses a proxy that is somewhere in the network, where a proxy is a network element that can store content that has been previously selected by other users connected to the internet through this proxy. For a DASH session, content is thus distributed not simply by content providers on the CDN network, but is also distributed in the network through the proxies. Distribution through the proxies is uncontrollable by the content providers because it depends on the client devices. For instance, distribution through the proxies may be significantly influenced by a client device's network location and capabilities. Most proxies thus cache only parts of the content (e.g., segments of media content of a certain bitrate, resolution, language classifications, or other characteristics by which they are cached).
Conventional DASH adaptation methods do not take this fact into account. As mobile networks and video transferring traffic thereon proliferate, this can impede optimum performance.
FIG. 1 depicts a basic representation of a DASH protocol where MF refers to a manifest file, DF refers to delivery format, ISOBMFF refers to a file format, and M2TS refers to a transport stream.
Proprietary solutions from various companies currently deployed in this area of technology include Microsoft's Smooth Streaming (see reference [8], incorporated by reference herein in its entirety), Adobe's Dynamic HTTP Streaming (see reference [9], incorporated by reference herein in its entirety), and Apple's HTTP Live Streaming (see reference [10], incorporated by reference herein in its entirety). Also other consortia such as ISO/IEC MPEG (see reference [2], incorporated by reference herein in its entirety) or 3GPP (see reference [7], incorporated by reference herein in its entirety) are currently trying to standardize this technology.
Each of these systems can follow nearly the same architecture as depicted in FIG. 1 and can utilize some kind of manifest file (MF). The manifest file can provide a description of media content adapted to be transferred and is generally XML (extensible markup language) based. On request, the manifest file can be provided to the client device in order to initiate a session. The client can parse the manifest file and request individual segments compliant to a delivery format (DF) using HTTP (or other protocols) and according to the information found in the manifest file. Consequently, in the present disclosure, the manifest file can be referred to as an MPD (Media Presentation Description) and the data model of this MPD is depicted in FIG. 2. The MPD can follow a data model comprising a sequence of one or more representations. A single representation can refer to a specific media having certain characteristics such as bitrate, resolution, or language. Furthermore, each representation may comprise one or more segments that describe the media content and/or metadata to decode and present the included media content.
The adaptation method (also referred to as adaptation logic) in such a system is generally located at the client side, which can be beneficial because the client knows its capabilities and bandwidth requirements best. However, this technique can also introduce same drawbacks. Current research (see references [1], [3], and [12]-[19], each of which is incorporated by reference herein in its entirety) is focused on one client device and how to properly adapt to this client device's needs to yield the best quality. New drawbacks may arise with increased deployment of adaptation methods such as DASH. For instance, mobile networks may be affected in the future because mobile data traffic may grow by a factor of 40 between 2009 and 2014 as recent studies seem to indicate (see reference [11], incorporated by reference herein in its entirety). Among mobile traffic, mobile video traffic may then account for approximately 66% of all mobile traffic.