Streaming media delivery may become increasingly important as it becomes more common for high quality audio and video to be delivered over packet-based networks, such as the Internet, cellular and wireless networks, powerline networks, and other types of networks. The quality with which the delivered streaming media can be presented may depend on a number of factors, including the resolution (or other attributes) of the original content, the encoding quality of the original content, the capabilities of the receiving devices to decode and present the media, timeliness and quality of the signal received at the receivers, etc. To create a perceived good streaming media experience, transport and timeliness of the signal received at receivers may be especially important. Good transport may provide fidelity of the stream received at the receiver relative to what a sender sends, while timeliness may represent how quickly a receiver can start playing out the content after an initial request for that content.
A media delivery system can be characterized as a system having media sources, media destinations, and channels (in time and/or space) separating sources and destinations. Typically, a source includes a transmitter with access to media in electronically manageable form, and a receiver with an ability to electronically control receipt of the media (or an approximation thereof) and provide it to a media consumer (e.g., a user having a display device coupled in some way to the receiver, a storage device or element, another channel, etc.).
“Consumption” is a process at a destination that uses the media being consumed in some way, such as for presentation. For example, a mobile video player “consumes” video data, often at the playout rate of the video. Where the media has a playout rate, such as video that has a normal speed playout rate, a “presentation time” can be defined. For example, the point in a media stream that a viewer would reach after 2:00.00 minutes of normal, uninterrupted playing from the beginning of a media presentation is referred to as having a presentation time (“PT”) of 2:00.00.
While many variations are possible, in a common example, a media delivery system has one or more servers that have access to media content in electronic form, and one or more client systems or devices make requests for media to the servers, and the servers convey the media using a transmitter as part of the server, transmitting to a receiver at the client so that the received media can be consumed by the client in some way. In a simple example, there is one server and one client, for a given request and response, but that need not be the case.
Traditionally, media delivery systems may be characterized into either a “download” model or “streaming” model. The “download” model might be characterized by timing independence between the delivery of the media data and the playout of the media to the user or recipient device.
As an example, in a download model or configuration, a receiver that is coupled to a media player or other media consumer and the media is downloaded far enough in advance of when it is needed by the player/consumer or will be used. When the media is used (consumed), as much as is needed is preferably already available at the recipient. Delivery in the download context is often performed using a file transport protocol, such as HTTP, FTP or File Delivery over Unidirectional Transport (“FLUTE”) and the delivery rate might be determined by an underlying flow and/or congestion control protocol, such as TCP/IP. The operation of the flow or congestion control protocol may be independent of the playout of the media to the user or destination device, which may take place concurrently with the download or at some other time.
The “streaming” mode might be characterized by a tight coupling between the timing of the delivery of the media data and the playout of the media to the user or recipient device. Delivery in this context is often performed using a streaming protocol, such as the Real Time Streaming Protocol (“RTSP”) for control and the Real Time Transport Protocol (“RTP”) for the media data. The delivery rate might be determined by a streaming server, often matching the playout rate of the data.
Some disadvantages of the “download” model may be that, due to the timing independence of the delivery and playout, either media data may not be available when it is needed for playout (for example due to the available bandwidth being less than the media data rate), causing playout to stop momentarily (“stalling”), which results in a poor user experience, or media data may be required to be downloaded very far in advance of playout (for example due to the available bandwidth being greater than the media data rate), consuming storage resources on the receiving device, which may be scarce, and consuming valuable network resources for the delivery which may be wasted if the content is not, eventually, played out or otherwise used. In addition, the most pleasing user experience in many cases is to be able to view the video almost immediately after the user decides what to view, as opposed to the model where the user has to order a video after deciding what to view and then having to wait minutes, hours, or potentially days before viewing is possible.
An advantage of the “download” model may be that the technology needed to perform such downloads, for example HTTP, is very mature, widely deployed, and applicable across a wide range of applications. Download servers and solutions for massive scalability of such file downloads (for example, HTTP Web Servers and Content Delivery Networks) may be readily available, making deployment of services based on this technology simple and low in cost.
Some disadvantages of the “streaming” model may be that generally the rate of delivery of media data is not adapted to the available bandwidth on the connection from server to client and that specialized streaming servers or more complex network architecture providing bandwidth and delay guarantees are required. Although streaming systems exist which support variation of the delivery data rate according to available bandwidth (for example Adobe Flash Adaptive Streaming), these are generally not as efficient as download transport flow control protocols such as TCP at utilizing all the available bandwidth, and require specialized server infrastructure to support the server/client streaming sessions.
Recently, new media delivery systems based on a combination of the “streaming” and “download” models have been developed and deployed. An example of such a model is referred to herein as “adaptive HTTP streaming” (herein referred to as “AHS”). Typically, the same multimedia content may be encoded in a variety of different ways, e.g., to accommodate client devices having different decoding and rendering capabilities and to allow for bandwidth adaptation. For example, the same multimedia content (such as the same movie) may be encoded according to different codecs, different frame rates, different bit rates, or different resolutions, be encapsulated using different file formats, and the like. Each unique combination of characteristics forms a “representation” of the multimedia content.
A variety of representations may have similar decoding and rendering requirements (e.g., similar codecs, related resolutions, similar file formats, etc.) but provide different bitrates providing different quality representations of the same multimedia content. An “adaptation set” is a group of representations from which a client might select a representation. For a given period of a presentation, there might be multiple adaptation sets, such as a video adaptation set and an audio adaptation set, but in examples herein, a single adaptation set might be considered, without loss of generality.
Representations may be divided into a plurality of segments, each segment for example corresponding to a period of time for the multimedia content, e.g., ten seconds, one second, 30 seconds, etc. A client device may determine an amount of bandwidth that is currently available, and then request one or more subsequent segments from one of the representations based on the bitrates of the representations and the currently available bandwidth. In this manner, when the amount of bandwidth increases, the client device may request segment(s) from a relatively higher quality representation, but when the amount of bandwidth decreases, the client device may request segment(s) from a relatively lower quality representation that can be accommodated by the amount of bandwidth.
Each segment of a representation can be stored in a file, wherein a URL is associated with each such file, and such that the URL can be used by an AHS client to request an HTTP download of the segment file from standard HTTP web servers. Using this approach, in response to a user request to play the media content from a particular starting point, an AHS client can issue HTTP requests to download segments and receive the requested segments for playout. Ideally, the playout of the media content from the user-requested starting point can begin on the device soon after the user request, and playout of the media content can continue in real-time from that starting point thereafter, i.e., without stall events wherein the media playout is temporarily stalled while waiting for additional media data to arrive. In some AHS systems, HTTP byte range requests may also be used to request portions of segments.
One advantage of AHS is that it can be deployed using standard HTTP web servers, i.e., the same infrastructure used to deliver other Internet content. Another potential advantage of AHS is that AHS clients deployed across a multitude of different devices on a multitude of different types of networks can dynamically decide which representation is suitable to request for playout depending on the characteristics of the playout device, on network conditions such as available bandwidth for downloading, and other factors. This last advantage is especially important for mobile devices, wherein the available network bandwidth might dynamically vary on short time scales, and thus to avoid stalls while at the same time providing the highest quality playout possible for the given network conditions, the AHS client might need to quickly change from requesting segments of one representation to requesting segments of another representation. Furthermore, other characteristics might change quickly on a mobile device, e.g., the amount of remaining charge in the battery, or the display size of the playout, etc.
Frames that might be suitable to be considered to be used for switching between representations, such as I frames, may not provide as much video compression other types of frames, such as P or B frames. However, when points at which a switch are less frequent, a client device attempting to switch between representations may need to wait for a lengthy period of time before a switch is possible, which may cause user experience to suffer (e.g., due to delayed starting of playback for the representation initially or a rebuffering/stuttering playout event after a switch between representations). Therefore, strategic decisions are typically made regarding how frequently to include frames that might be used for switching between representations of multimedia content.
One popular AHS solution is the HTTP Live Streaming protocol initially developed by Apple Inc. (hereafter referred to as “HLS”), and sometimes also called Apple Live Streaming (abbreviated herein to “ALS”), as described in the IETF draft titled “HTTP Live Streaming” and which can be found at the URL with the domain “tools.ietf.org” and the file there named “/html/draft-pantos-http-live-streaming-06”. Another popular AHS solution is described in the MPEG DASH standards, for example as specified in ISO/IEC 23009-1:2012 (abbreviated herein “MPEG DASH” or the “MPEG DASH standard”).
Thus, what is needed are improved signaling methods and usage of such signaling methods that provide a seamless user video experience during representations switches, and also enable efficient usage of network resources during representation switches.