Transmission of media content (e.g., video, audio, and/or data, etc., collectively or individually referred to herein also as content) between different nodes on a network may be performed in a variety of ways. The type of content that is the subject of the transfer and the underlying network conditions usually determine the methods used for communication. For instance, for a simple file transfer over a lossy network, one emphasis is on reliable delivery. The packets may be protected against losses with added redundancy or the lost packets may be recovered by retransmissions. In the case of audio/video content delivery with real-time viewing requirements, one emphasis is on low latency and efficient transmission to enable the best possible viewing experience, where occasional losses may be tolerated.
The structure of the packets and the algorithms used for real-time content transmission on a given network may collectively define a chosen content streaming protocol. Although various content streaming protocols available today differ in implementation details, they can generally be classified into two main categories: push-based protocols and pull-based protocols. In push-based streaming protocols, once a connection is established between a server (e.g., server device or server software) and a client (e.g., client device or client software), the server remains active on the session and streams packets to the client until the session is torn down or interrupted for example by a client pausing or skipping in the content. In pull-based streaming protocols, the client is the active entity that requests the content from the server. Thus, the server response depends on the client request, where otherwise the server is idle or blocked for that client. Further, the bitrate at which the client wishes to receive the content is entirely determined by the client. The actual rate of reception depends on client's capabilities, load on the server and the available network bandwidth. As the primary download protocol of the Internet, HTTP is a common communication protocol upon which pull-based content delivery is based.
In pull-based adaptive streaming, the client makes a decision about which specific representation of any given content it will request next from a server, where each representation may be received at the client in the form of a plurality of requested chunks (piece of content, where a chunk may be one or more Groups of Pictures (GoP) as known in MPEG-compatible systems, or a “fragment” in MPEG-4 systems, or other suitable sub-divisions of an entire instance of content, also can be called a fragment or a segment). Such a decision may be based on various parameters and/or observations, including the current (observed/available) bandwidth and the amount of data currently residing in a client buffer. Throughout the duration of a given viewing experience, the client may upshift or downshift (e.g., switch to a representation using a higher or lower bitrate) or stay at the same bitrate based on the available bandwidth and buffer conditions, among other factors. However, sometimes transitions in bitrate cause further delays due to certain conditions, such as reduced bandwidth.
Adaptive streaming (e.g., adaptive video streaming) generally structures a content stream as a multi-dimensional array of content chunks. A chunk represents temporal slices of the content (e.g., 2-10 seconds in duration), which has been encoded or otherwise processed to produce differing levels of quality, different resolutions, etc., and in particular, has different sizes requiring different amounts of bandwidth to deliver to one or more client devices. Virtually all current adaptive streaming systems today use a two-dimensional matrix, with one dimension consisting of the time, and the other dimension consisting of (target) encoding rate. In addition, current adaptive streaming systems use a variety of storage structures for the content, such as directories with individual files for each chunk, fragmented MPEG-4 files (e.g., a standardized file format), or custom packaging schemes. The structure of the content matrix, along with associated metadata describing each chunk, is contained in a separate structure, generally referred to as a manifest. The manifests are typically divided into representations each of which describes one row of the content matrix (e.g., all the chunks encoded at a bitrate X). There exist various schemes and emerging standards for the manifests.