The transport of information content such as multimedia content over the Internet has gained momentum and entered our daily lives, whether it be a major live event or on-demand access. For example, we see people accessing live sport events or watching their favorite TV series on a plethora of devices ranging from high-resolution, well-connected TV sets to smart phones with limited display and network capabilities. All of these use cases have something in common, namely the content is delivered over the Internet and on top of the existing infrastructure. In general, the amount of video traffic is growing tremendously, specifically for mobile environments (e.g., WiFi, 3/4G, smart phones, tablets) [4, 25], and the existing Hypertext Transfer Protocol (HTTP) infrastructure has become an important driver [20] for this type of application despite it is mainly deployed over the Transmission Control Protocol (TCP) [31].
The advantage of using HTTP is that it is client-driven and scales very well, thanks to its stateless design. Furthermore, the delivery of multimedia content over HTTP exploits existing infrastructures initially deployed for Web traffic such as servers, proxies, caches, and content distribution networks (CDNs). Additionally, it typically does not cause any firewall or network address translation (NAT) issues, which was the main reason for the Realtime Transport Protocol (RTP) not being widely adopted. Finally, the usage of HTTP allows for a receiver/client-driven approach—in contrast to a sender/server-driven approach—without the need for an explicit adaptation loop (feedback channel). Such a client-driven approach further increases scalability and enables flexibility as usually the client or receiver knows its context best.
The basic concept of todays' HTTP-based multimedia streaming solutions is to provide multiple versions of the same content (e.g., different bitrates), chop these versions into small segments (e.g., two seconds), and let the client decide which segment (of which version) to download next, based on its context (e.g., bandwidth). Typically, the relationship between the different versions is described by a manifest, which is provided to the client prior to the streaming session.
Although we see many deployments of such services, the multimedia streaming over HTTP is mainly based on proprietary industry solutions such as Adobes' HTTP Dynamic Streaming (HDS) [1], Apples' HTTP Live Streaming (HLS) [18], and Microsofts' Smooth Streaming [32]. Thus, interoperability is limited but with ISO/IEC MPEG Dynamic Adaptive Streaming over HTTP (DASH) a standardized solution—based on 3GPP's Adaptive HTTP Streaming (AHS) [29]—is available. DASH specifies representation formats for both the manifest and segments [27, 28]. Supported segment formats are the MPEG-2 transport stream (M2TS) and ISO base media file format (ISOBMFF) and for the manifest, DASH defines the XML-based Media Presentation Description (MPD) representing the data model, which is aligned with existing, proprietary solutions, i.e. provide multiple versions of the same content—referred to as representations—, chop the content into time-aligned segments to enable seamless switching between different representations, and enable the client to request these segments individually based on its current conditions. The standard provides a specification only for the MPD and segment formats, respectively. It deliberately excludes end-to-end system aspects and client implementation details, which are left open for industry competition. Hence, DASH as a standard may be considered as an enabler to build such systems [30].
The most changeling part of a DASH client implementation is the component that determines which segment to download next. This component is often referred to as adaptation logic. After receipt of the MPD, it basically analyzes the available representations (e.g., bitrates, resolutions) given the current context (e.g., bandwidth, display size) and starts downloading the segments accordingly. In case the context changes (e.g., due to a drop of the available bandwidth), the client may switch to another representation that is suitable for the new context. The actual switching is typically done at segment boundaries and, in general, the behavior of the adaptation logic has a direct influence on the system performance. The system performance depends on a number of metrics which can be both of objective and subjective nature. The former mainly reflects the Quality of Service (QoS) whereas the latter is often associated with the Quality of Experience (QoE) [10]. Additionally, in large-scale deployments, multiple clients may compete with each other and may introduce unwished issues, specifically when proxies/caches are deployed, which is often the case in combination with CDNs.
Early deployments of HTTP streaming use progressive download where the client opens a TCP connection to a server and progressively downloads the multimedia content. As soon as enough data is available on the client, the client could eventually start with the decoding and rendering respectively. However, in case of bandwidth fluctuations, clients cannot react which is typically followed by service interruptions, also referred to as stalls. Probably one of the first adaptive HTTP streaming solutions employed an explicit adaptation loop—inspired by RTP-based streaming—where clients perform bandwidth measurements and push the information towards the server. The server analyzes these reports and modifies the progressive download session on-demand [5]. Additionally, various other industry solutions exist as already mentioned above, each following the same principles that led to the standardization of DASH but with some (minor) differences in terms of manifest and segment formats.
As a consequence of the work on DASH, many papers evaluate DASH and/or existing approaches in simulated environments [3] or based on bandwidth traces [15, 22] while others focus on the adaptation logic itself [11, 14, 23]. For example, Liu et al. [11] describe a rate adaptation algorithm for adaptive HTTP streaming. Riiser et al. [23] suggest a location-based bandwidth-lookup service to perform bandwidth estimation for adaptive video streaming. Mueller et al. [14] show an improved adaptation logic experimenting with an exponential buffer model. While these papers provide interesting insights into the behaviour of existing approaches, results are preliminary and experimental.
Another important aspect of DASH-like systems is the QoE. Oyman et al. describes a system approach enabling QoE for adaptive HTTP streaming services [17] which, in fact, focuses on QoS metrics like HTTP request/response transactions, representation switch events, and average throughput standardized as part of 3GPP AHS [29]. QDASH [12] describes QoE-aware DASH system based on bandwidth measurements and subjective quality assessments while Sieber et al. proposes a user-centric DASH algorithm specifically designed for scalable video coding [26]. In practice, however, most prominent QoE metrics are initial (or start-up) delay and service interruptions (or stalls), respectively, where Hoβfeld et al. reveals that stalls shall be avoided at all in favor of start-up delays [7]. Another study investigates the QoE impact of the frequency and amplitude of quality switching events (flickering) for different content types showing that amplitude is dominant over frequency [16].
Proxy-based approaches may perform traffic shaping towards multiple clients [8] while others prefer server-based traffic shaping to stabilize oscillating clients [2]. Additionally, on-demand rate adaptation [21] and request re-writing [6] are other proxy-based solutions but do not take into account the issue when clients compete for bandwidth. The caching efficiency of DASH in combination with scalable video coding is shown in [24] and a model for the achievable throughput including TCP friendliness is described in [9]. Finally, Mueller et al. [13] investigates negative effects of a proxy and suggests an adaptation logic with certain countermeasures against proxies and their fouling of the bandwidth estimation, for example.
Dynamic adaptive streaming over HTTP allows for a flexible and scalable deployment of media ecosystems [3, 15, 22, 27, 28, 30] as the client encapsulates the entire streaming logic and no centralized controller is needed, also thanks to the stateless design of HTTP. Additionally, this kind of streaming approach enables the reuse of the already deployed Internet infrastructure comprising proxies, caches, and CDNs. However, the immanent nature of this ecosystem, which enables switching between individual quality levels without a centralized controller, may also introduce drawbacks. Problems may occur when multiple clients compete for bandwidth in a DASH-like streaming ecosystem.
The general assumption that TCP will accommodate the case when multiple clients compete for bandwidth is devitalized in [2, 13] where clients begin to oscillate, specifically by continuously switching between different quality levels. In particular, Akshabi et al. [2] propose a server-based solution to mitigate the oscillation effect but eliminates some major benefits such as the stateless design and the usage of ordinary HTTP servers. Furthermore, in a large-scale deployment where clients may request segments from multiple sources (servers, proxies, CDN nodes), the adoption of a server-based solution increases deployment costs and decreases scalability as segment sources need to exchange additional information about the oscillation state. In comparison, Mueller et al. [13] identify client oscillations in conjunction with proxy caches by double checking bandwidth estimates prior to bitrate increase decisions. They propose a client-centric approach that does not require any modifications on the existing infrastructure.
However, up to now there is no satisfying solution for setting, in a varying manner, the bitrate so as to avoid quality degradations due to bitrate oscillations, the solution not necessitating any central control and not being able to address any kind of oscillation.
It would be favorable to have an adaptation logic leading to less deficiency due to bitrate oscillations.