1. Technical Field
The present invention relates generally to techniques to automatically determine the encoding bitrate of the video source of an adaptive video streaming system. More precisely, it discloses a method to compute the encoding bitrate to deliver the best video quality to the user while avoiding playout interruptions and a method for efficiently scheduling the download of video segments.
2. Description of the Related Art
A video streaming system is composed of a sender (video server) that sends the video to a receiver (client) temporarily storing the video content in a queue, defined as playout buffer, that is drained by the player.
When the playout buffer gets empty, for instance due to a sudden decrease of the end-to-end available bandwidth, the player is paused to let a sufficient duration of video content to be stored in the playout buffer. At this point the player can resume the playback of the video. The mechanism described above is called re-buffering. In the paper entitled “Understanding the impact of video quality on user engagement” by F. Dobrian et al., presented at the ACM SIGCOMM conference in 2011, it has been shown that the duration and the frequency of re-buffering events are the main parameters negatively affecting the user perceived quality.
The goal of an adaptive video streaming system is to change in real-time the encoding bitrate of a video source, being it pre-recorded or live, to adapt it to the network available bandwidth so that playback interruptions due to re-buffering events can be avoided.
Today, such systems employ the HTTP over TCP to deliver the video instead of using protocols specifically tailored for streaming applications such as the Real Time Protocol (RTP) or the Real Time Streaming Protocol (RTSP) which use the UDP. This is the leading approach employed today by all the major video distribution platforms such as YouTube, NetFlix, Hulu, Livestream, Ustream.
In an adaptive video streaming systems the video produced by the encoder is divided in segments, or chunks, of a duration which is a multiple of the Group of Picture (GoP). The videos can be divided using two mechanisms: 1) the physical segmentation, 2) the logical segmentation.
The physical segmentation method requires the video to be physically divided into a number of files, one for each video segment. In this case the video segment is indexed by using its full path.
On the other hand, the logical segmentation requires the video to be logically divided. In this case a video segment is typically indexed using an index file specifying for each video segment its byte offset in the stored video file and the segment size in bytes.
Adaptive video streaming systems can be characterized based on two features: 1) the approaches employed to implemented adaptivity, 2) the employed control architecture.
Regarding the approaches that can be used to implement adaptivity we can divide the current proposals into the following three main categories.
1) Transcoding-based systems: they adapt the video content to the desired bitrate by changing in real-time the encoding bitrate of the raw video; such technique allows fine grained adaptation of the encoding bitrate to the available bandwidth, but it has the main drawback of requiring an encoding process for each video session; for this reason transcoding-based systems do not scale with the number of concurrent users.
2) Systems based on scalable codecs: such systems employ scalable codecs such as H.264 SVC, VP8, VP9. The raw video content is coded once and the encoding bitrate can be changed by exploiting the spatial and temporal scalability features of such codecs. With respect to the transcoding-based systems, this approach is more scalable since the encoding process is only made once.
3) Stream-switching or multi-bitrate systems: such systems encode the video in N versions, defined video levels, or representations; a control mechanism decides which video level should be sent to the receiver; such systems require N encoding processes for each video. This means that they have higher CPU and storage costs with respect to solutions based on scalable codecs, but they have the advantage of being codec agnostic, i.e. any encoder can be used.
Regarding the control architecture employed three different approaches can be employed as described in the following.
1) Client-side architecture: a controller placed at the client computes the encoding bitrate and sends such control signal to the video server; typically the mechanisms proposed in the literature employ bandwidth estimates as the controller input (see for instance the article entitled “Improving fairness, efficiency, and stability in http-based adaptive video streaming with festive.” by Jiang et al presented at the 8th International Conference on Emerging Networking EXperiments and Technologies (CoNEXT), 2012).
2) Server-side architecture: the controller is placed at the server and computes the video bitrate to be sent to the client by employing measurements made at the server (for instance, bandwidth estimates, transmission buffer length as described in the article by L. De Cicco et al entitled “Feedback control for adaptive live video streaming.” presented at the ACM conference on Multimedia Systems in 2011).
3) Hybrid architecture: in such architecture the control system can be distributed at the server and at the client; one such architecture is employed in the system proposed in the article by Akhshabi, Saamer, et al. entitled “Server-based traffic shaping for stabilizing oscillating adaptive streaming players” presented at the ACM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), 2013 and in the video streaming system employed by Akamai and used by the LiveStreaming platform.
Today, leading adaptive video streaming platforms, such as YouTube and Netflix, employ the client-side control architecture with stream-switching control systems and they use the HTTP infrastructure, i.e. servers, proxies to deliver the video and web browsers to consume the received video content at the client.
Two main standards have been proposed in this technological context: 1) the MPEG-DASH (see the article “The mpeg-dash standard for multimedia streaming over the internet” by I. Sodagar in IEEE MultiMedia, 18(4):62-67, 2011) and 2) HTTP Live Streaming (HLS) by Apple. Both the standards employ a manifest file, stored at the video server, that is used to associate to each segment-video level pair its corresponding URL.
The article entitled “An experimental evaluation of rate-adaptation algorithms in adaptive streaming over HTTP” by S. Akhshabi et al. presented at ACM Multimedia Systems Conference in 2011 has shown that such systems typically employ two control mechanisms: 1) a mechanism, the stream-switch controller, that computes the encoding bitrate; 2) a mechanism to control the playout buffer level. Regarding stream-switching controllers, the mainstream approach is to compute the video encoding bitrate as a function of available bandwidth measurements.
In the same article it has been shown that such systems can be in one of the following two states: 1) buffering: this state is activated at the beginning of a video streaming session, or after a rebuffering event, and it is left when the playout buffer level increases above a target threshold; when the system is in this state a video segment is requested as soon as the download of the last segment has been completed; this state is needed to fill the playout buffer as quickly as possible; 2) steady-state: when the client is in this state, video segments requests are always issued every T seconds where T is the length of the segment measured in seconds; this means that, denoted with Td the time required to download a chunk, if Td<T the client will schedule the download of the next segment after an idle period of T−Td.
It has been shown that the segment request mechanism employed when the client is at steady-state produces an ON-OFF traffic pattern with the main drawbacks described in the following.
1) In the article entitled “Performance of On-Off Traffic Stemming From Live Adaptive Segmented HTTP Video Streaming” by T. Kupka et al. presented at the IEEE Conference on Local Computer Networks in 2012 it has been shown that such ON-OFF traffic pattern leads to underutilization of the video server uplink bandwidth.
2) The same article also shows that concurrent video flows do not share fairly the same bottleneck.
3) In the article entitled “Confused, timid, and unstable: picking a video streaming rate is hard.” by T. Huang et al. presented at the 2012 ACM conference on Internet measurement conference in 2012 it has been shown that when a video flow shares the bottleneck with a TCP greedy flow, such as in the case of a concurrent file download, the video flows are not able to get the fair share; the same article has shown that Netflix, Vudu and Hulu, three popular VoD video streaming systems, are affected by such issues.