Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, video teleconferencing devices, and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), and extensions of such standards, to transmit and receive digital video information more efficiently.
After video data has been encoded, the video data may be packetized for transmission or storage. The video data may be assembled into a video file conforming to any of a variety of standards, such as the International Organization for Standardization (ISO) base media file format and extensions thereof, such as the MP4 file format and the advanced video coding (AVC) file format. Such packetized video data may be transported in a variety of ways, such as transmission over a computer network using network streaming.
In the mid 2000's, the growth of video and audio traffic over the Internet via Real-time Transport Protocol (RTP) began to flood the Internet with a great deal of network traffic, and there were no congestion controls for these protocols. This led corporate information technology (IT) administrators to program their firewalls to block RTP packets containing video and audio streams that were choking the gateways at corporations.
The firewalls threatened existence of video and audio streaming services. Therefore, service providers began to provide content over TCP (more specifically, the HTTP port of TCP) virtual circuits. They did this to camouflage their video and audio traffic as useful HTTP traffic. IT firewall administrators could not easily block video and audio over HTTP/TCP, and so, for a while, video and audio over HTTP over TCP flourished.
Initially a “progressive download” method was used for download of most videos. In this mechanism, a single HTTP connection and transfer is used to download the entire video file. The user watches the download occur, and when they feel that enough data has been buffered to support the entire stream-viewing experience, they hit “PLAY” and begin to display the video. The player may start playout automatically once sufficient data is downloaded providing a pseudo-streaming experience. This method suffered from problems, however, when the user wanted to watch video right away, especially on low-capacity links. Another problem was that in a changing wireless environment, adaptive download could suddenly downshift to a snail's pace, causing stalls in the middle of a video.
Since 2005, work has been underway to implement Adaptive Streaming over HTTP, which attempts to address these problems. Examples of adaptive streaming protocols include Microsoft Smooth Streaming (MSS), Apple Live Streaming (ALS), Adobe HTTP Dynamic Streaming (AHDS), and the 3GPP Standard, Dynamic Adaptive Streaming over HTTP (DASH). In 2011, the Netflix video streaming service (based upon MSS) consumed 30% of the North American Internet backhaul at peak times, in the evening, delivering video packets to customer homes.
Adaptive streaming methods organize a video very much like an HTML web page. For example, in DASH, a “video web page” is defined to reference all of the “fragments” (sub-files, also referred to as sub-segments) that comprise the video. A fragment is typically 2 seconds of real-time video or audio, and it typically begins with an MPEG I-frame (essentially a full JPEG-encoded picture) in the case of video. In H.264/AVC such frames are referred to as Instantaneous Decoder Refresh (IDR) frames. In DASH a “video web page” is referred to as a “Media Presentation Description” (MPD). An MPD for a 2-hour video might reference 3600 video uniform resource locators (URLs), and 3600 audio URLs, each of which may correspond to 2 seconds of media when played back. And note that 3600 video URLs may be provided for each bit-rate at which the video is encoded.
One improvement of DASH is that the same video may be described at several different bit-rates, and the player can switch bit-rates (for example, every 2 seconds.) An MPD generally describes 3-8 different renderings of the same video, referred to as representations. When the Internet is congested, or when the terminal is on a low-capacity link, a low bit rate fragment may be fetched. When the Internet is uncongested and the terminal has a high-capacity link, a high bit rate fragment may be fetched. Typically, a single audio stream is fetched and no bit rate switching occurs with audio. When network or link conditions change, the player may adapt by fetching video fragments at higher or lower bit rates. The player typically adapts at the boundary of a fragment. Thus, the player may dynamically adapt to changing congestion conditions on the Internet, and transport both audio and video data over HTTP. Note that if 8 different representations are offered, a total of 3600*8=28,800 fragments may be managed on the origin sever.
After HTTP 0.9 was introduced in 1993, it became so successful that the Internet was soon choked with HTTP requests. Then in 1997, HTTP 1.0 was standardized in RFC 2068, which included caching. Browsers began to cache objects but also, researchers began to build transparent HTTP Proxy Cache devices to take advantage of new caching features in HTTP 1.0. A proxy cache device spies on HTTP GET requests and generally forwards the requests without changing them. When the proxy cache notices an HTTP response with one of ˜5 HTTP “caching” headers (which means the content has a long lifetime and can be cached), such as a jpeg picture or a stock quote good for 20 minutes, the proxy cache device may store the cacheable response and replay it when the same or a different user requests the content later. A network administrator may reprogram switches or routers to route all HTTP traffic through their proxy cache.
In addition, HTTP 1.1 (as specified in RFC 2616) provides for partial GET requests. Partial GET requests include information specifying a target URL, as well as a “Range:” header followed by values indicative of a desired byte range. Despite provision by HTTP 1.1, not all web browsers implement the use of partial GET requests. Moreover, even when web browsers (or other applications executed by a client device) do implement partial GET requests, intermediate network devices, such as proxy servers, proxy cache devices, or other proxy devices, are often configured to retrieve the full file, not just the portion requested by the client device.
Proxy devices are commonly configured to perform additional actions on network traffic, such as deep packet inspection to detect viruses or other malicious network traffic, caching (to respond to other requests for the same data), or other functions requiring retrieval of the entire file. Therefore, such proxy devices tend to strip away the range request and retrieve the entire file at the specified URL, and thus, provide the entire retrieved file to the requesting client device. For example, certain virus scanning algorithms require scanning an entire file, in which case it is necessary to download the entire file. However, for a relatively large multimedia file (such as a two hour movie), retrieving the full file instead of the byte range requested may impose significant delays on transmission of the relatively small byte range to a requesting client device.