Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, video teleconferencing devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive digital video information more efficiently.
After video data has been encoded, the video data may be packetized for transmission or storage. The video data may be assembled into a video file conforming to any of a variety of standards, such as the International Organization for Standardization (ISO) base media file format and extensions thereof, such as ITU-T H.264/AVC. Such packetized video data may be transported in a variety of ways, such as transmission over a computer network using network streaming.
In the mid 2000's, the growth of video and audio traffic over the Internet Real-time Transport Protocol (RTP) began to flood the internet with a great deal of network traffic, and there were no congestion controls for these protocols. This led corporate information technology (IT) administrators to program their firewalls to block RTP packets containing video and audio streams that were choking the gateways at corporations.
The firewalls threatened existence of video and audio streaming services. Therefore, service providers began to provide content over TCP (more specifically, the HTTP port of TCP) virtual circuits. They did this to camouflage their video and audio traffic as useful HTTP traffic. IT firewall administrators could not easily block video and audio over HTTP/TCP, and so, for a while, video and audio over HTTP over TCP flourished.
Initially a “progressive download” method was used for download of most videos. In this mechanism, a single HTTP connection and transfer is used to download the entire video file. The user watches the download occur, and when enough data has been buffered to support the entire stream-viewing experience, either the player or the user hits “PLAY” and video playback commences. This method suffered from problems, however, when the user wanted to watch a video right away, especially on low-capacity links. Another problem was that in a changing wireless environment, adaptive download could suddenly downshift to a snail's pace, causing stalls in the middle of a video.
Work has been underway to implement Adaptive Streaming over HTTP, which attempts to address these problems. Examples of adaptive streaming protocols include Microsoft Smooth Streaming (MSS), Apple HTTP Live Streaming (HLS), Adobe HTTP Dynamic Streaming (AHDS), and the 3GPP Standard, Dynamic Adaptive Streaming over HTTP (DASH). In 2011 the Netflix video streaming service (based upon MSS) consumed 30% of the North American Internet backhaul at peak times, in the evening, delivering video packets to customer homes.
Adaptive streaming methods generally organize video data very much like an HTML web page. For example, in DASH, a “video web page” is defined to reference all of the “fragments” (sub-files) corresponding to the video data. A fragment is typically 2 seconds of real-time video or audio, and it typically begins with an MPEG I-frame (essentially a full JPEG-encoded picture) in the case of video. In DASH a “video web page” is referred to as a “Media Presentation Description” (MPD). An MPD for a 2-hour video might reference 3600 video uniform resource locators (URLs), and 3600 audio URLs, each of which may correspond to 2 seconds of media when played back. And note that 3600 video URLs may be provided for each bit-rate at which the video is encoded.
One improvement of DASH is that the same video may be described at several different bit-rates, and the player can switch bit-rates (for example, every 2 seconds.) An MPD generally describes 3-8 different renderings of the same video, referred to as representations. When the Internet is congested, or during initial startup, or when the terminal is on a low-capacity link, a low bit rate fragment may be fetched. When the Internet is uncongested and the terminal has a high-capacity link, a high bit rate fragment may be fetched.