Digital media files such as video and audio are typically delivered over a network using one of two methods: streaming or download. Streaming media involves sending portions of a media file from a media server to a media client and playing the received portions as they are received. With streaming media, the user does not have to wait to download a large file before seeing the video or hearing the audio. Instead, the media is sent in a continuous stream and is played as it arrives.
In streaming, the media server typically opens a conversation with the media client. The conversation usually has two parts: one part is for control messages between the media client and the media server; and the other part is for transferring the media (e.g. video) from the media server to the media client. Because the media server and the media client continue to exchange control messages, the media server can adjust to changing networks conditions as the media is played. The control messages also typically include user actions like play, pause, stop, and seeking to a particular part of the file.
Most modern media transmission systems use RTP (Real-time Transfer Protocol)/RTSP (Real Time Streaming Protocol) for streaming. RTSP is a protocol for use in streaming media, which allows a client to remotely control a streaming media server, issuing VCR-like commands such as “play” and “pause”, and allowing time-based access to files on a media server. The sending of streaming data itself is not part of the RTSP protocol. Most RTSP systems use RTP as the transport for the actual audio/video data.
Unlike streaming, the download method involves transferring the entire media file from a web server (HTTP server) to the media client. Historically, the download method required that the media client had to wait until the entire file was downloaded before it could start playing the media file. More recent solutions, however, allow a media client to start playing the media file once a portion of it has been downloaded; this is referred to as progressive download. Progressive download provides a better end user experience over the traditional download, as the media client can start playing the media file as soon as it receives enough of the file to begin the decoding and displaying functions.
Recently, a new video coding standard, referred to as Scalable Video Coding (SVC) was developed. SVC is an extension of the H.264/MPEG-4 AVC video compression standard. When a video file is SVC encoded, it is encoded into one or more layers, of differing quality. The layer with the lowest quality, referred to as the base layer, contains the most important part of the video stream. One or more enhancement layers may then be encoded to further refine the quality of the base layer. The enhancement layers are used for improving the spatial resolution (picture size), temporal resolution (frame rate), and the SNR (signal to noise ratio) quality of the base layer.
An SVC encoded video stream is organized into NAL (Network Abstraction Layer) units. The NAL unit headers identify which SVC layer the unit belongs to. The NAL unit header information can be used to strategically drop layers of the stream. The process of strategically dropping packets is referred to as thinning. Thinning allows media streams to be tailored to the media client by delivering media streams with different resolutions, frame rates and quality to different media clients. Thinning has the effect of changing the bandwidth requirements of a media stream.
A network element, such as a media-aware network element (MANE), can be inserted in the network between a media server and a media client to dynamically thin an SVC-encoded video stream according to the media client capabilities and the network characteristics thereby achieving efficient use of available bandwidths.
However, a standard network element, such as a MANE, cannot be used to dynamically thin a progressive download. This is due to the way that media files are stored and transmitted from a web server. Specifically, media files (scalable or otherwise) are typically stored on a web server in a media container format. A media container format is a computer file format that can contain several types of data (such as audio and video) compressed by means of standardized audio/video codecs. The container file is used to identify and interleave the different data types. The structure of these formats is such that they are based on defined lengths of data. The data length is generated when the file is encoded/created and then embedded within the file itself in a multi-byte length field. When an application reads the file it uses the length field to parse the file into its multiple fields.
When a web server receives a progressive download request (e.g. a “HTTP get” request”), the web sever reads the file from the start, breaks it up into appropriate size packets and encapsulates it using a standard protocol (i.e. HTTP) and transmits it to the media client.
If a network element (i.e. a MANE) were to perform thinning on the HTTP packets as they passed through the network element, it would be manipulating the content of the packets, thus changing the length of the file. Accordingly, the network element would have to modify the associated length field for the requesting media client to be able to correctly interpret the file according to current container format definitions. However, the length field is transmitted before the actual media content, therefore the network element would not be able to accurately change the length field before it is forwarded to the media client. Specifically, at the time the network element receives the length field, it does not know the ultimate length of the file.
Accordingly, there is a need for a method and system for dynamically thinning a progressive download between a web server and a media client.