Many television and movie viewers now desire on-demand access to video and other media content. As a first example, a television viewer may desire to watch a television show that he or she missed during the show's regular air time on television. The viewer may download the show on demand over the Internet via a web browser or other application on a notebook computer, tablet computer, desktop computer, mobile telephone or other device, then view that show in the browser or other application. In other examples, a viewer may download a movie on demand or may participate in a videoconference with other viewers.
Dynamic Adaptive Streaming over Hypertext Transfer Protocol (DASH) is a standard developed to provide such media content and is partially described in International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 23009-1, First Edition, 2012 (“23009-1”), which is incorporated herein by reference in its entirety. In addition, ISO/IEC 23009-1, Technical Corrigendum 1, 2013 is incorporated herein by reference in its entirety. In DASH, there are two main devices: the Hypertext Transfer Protocol (HTTP) server(s) that provide the content and the DASH client that downloads the content and is associated with the viewer (or user). Currently, DASH leaves control with the client, which can request content using the HTTP protocol.
DASH functions to partition content (e.g., a video of potentially many minutes or hours of duration) into a sequence of smaller media segments—each segment being of a short interval of playback time. Each segment is made available to a DASH client in multiple alternatives—each at a different bit rate. As the content is played, the DASH client automatically selects a next segment (to be requested/played) from its alternatives. This selection is based on various factors, including current network conditions. The resulting benefit is that the DASH client can adapt to changing network conditions and play back content at a highest level of quality without stalls or rebuffering events.
DASH clients can be any devices with DASH and media content playing functionality having wireless and/or wireline connectivity. For example, a DASH client may be a desktop or laptop computer, smartphone, tablet, set-top box, televisions connected to the internet, and the like, etc.
Now referring to FIG. 1, there is illustrated a DASH standards-based adaptive media streaming model where portions of media streams and media segments are requested by DASH client devices 10a-10n using HTTP and are delivered by one or more DASH (HTTP) servers 12 via a network 11 (including the internet). As will be appreciated, the telecommunications network 11 may be any suitable network (or combinations of networks) enabling transmission of media content using HTTP. As an example only, the telecommunications network 11 is shown as including various telecommunications resources and infrastructures, such as network address translators and/or firewalls 18, caches 14 and Content Distribution Networks (CDNs) 16. These resources support on-demand, live streaming and time-shift applications and services to network-connected devices, such as the DASH clients 10a-10n. 
Each DASH client 10 can dynamically adapt the bitrate of the requested media content/stream to changes in network conditions, by switching between different versions of the same media segment encoded at different bitrates.
As illustrated in FIG. 2, DASH is based on a hierarchical data model described by a Media Presentation Description (MPD), which defines formats to announce resource identifiers for a collection of encoded and deliverable versions of media content. The MPD is an XML document that advertises the available media and provides information needed by the DASH client in order to select segments from a Representation, make adaptation decisions, and retrieve segments from their servers via the network. Media content is composed of single or multiple contiguous segments.
The MPD provides sufficient information for the DASH client to provide a streaming service to the user by requesting segments from an HTTP (DASH) server and de-multiplexing (when needed), decoding and rendering the received media segments. The MPD is completely independent of media segments and only identifies the properties needed to determine whether a Representation can be successfully played and its functional properties (e.g., whether segments start at random access points).
As further illustrated in FIG. 2, a media segment is the minimal individually addressable unit of content data. It is the entity that can be downloaded using a URL advertised via the MPD. One example of a media segment is a 4-second part of a live broadcast, which starts at playout time 0:42:38, ends at 0:42:42, and is available within a 3-minute time window. Another example could be a complete on-demand movie, which is available for the whole period the movie is licensed.
A Representation defines a single encoded version of the complete asset, or of a subset of its components. For example, a Representation may be an ISO-BMFF containing unmultiplexed 2.5 Mbps 720p AVC video, and separate ISO-BMFF Representations may be for 96 Kbps MPEG-4 AAC audio in different languages. Conversely, a single transport stream containing video, audio and subtitles can be a single multiplexed Representation. For example, as a multiplexed Representation with multiple media components, an ISO BMFF file contains a track for 2.5 Mbps 720p AVC video and several tracks for 96 Kbps MPEG-4 AAC audio in different languages in the same file. A combined structure is possible: video and English audio may be a single multiplexed Representation, while Spanish and Chinese audio tracks are separate unmultiplexed Representations.
Spatial adaptation in adaptive streaming is about adaptation of streaming content in its spatial domain in terms of spatial objects, typically in response to changes in location, depth, shape and size of some regions of interest in its video component. Tiled adaptive streaming is a spatial adaptation technique that can minimize bandwidth usage by subdividing a video stream component into different levels of spatial objects, called “tiles”, in its spatial domain, in addition to its temporal domain and quality levels into different representations of segments. A tile can be specified as a Representation of temporal segments of a certain level of quality for a sub-region of the video component. Given a finite amount of available bandwidth, a user can choose from downloading from a range of a large region in lower quality up to a very specific and small region in the highest quality possible. Tiled adaptive streaming is more fully described in “m28883 Spatially Segmented Content Description, MPEG#104, Incheon, April 2013” (incorporated herein by reference).
It is generally believed that dynamic adaptive streaming enabled by DASH (“m29232 Interactive ROI Streaming with DASH, MPEG#104, Incheon, April 2013”, incorporated herein by reference) is largely driven and managed by the DASH client, whereas the server merely plays a Segment hosting role. For example, this may be seen from the Annex A (“Example DASH client behavior”) of the DASH Part 1 specification, incorporated herein by reference.
In such a client-managed adaptive streaming (CMAS) system, it is the client that not only selects a set of Adaptation Sets and one specific Representation within each Adaptation Set and makes requests for Segments therein, but also makes decisions about Representation switching, updated MPD fetching, and encoder clock drift control. All these selections and decisions are intended to suit the client environment based on information provided in the MPD (e.g., @bandwidth of each selected Representations), static characteristics of the environment (e.g., client decoding and rendering capabilities), and dynamic characteristics that the client monitors about its changing environment (e.g., available bandwidth of the network connection).
Turning to FIG. 3, there is illustrated the architecture (functional block diagram) of a CMAS system having a conventional DASH client 200 interconnected with an HTTP (DASH) server, and further illustrating various function modules or components involved in the streaming process.
The Monitoring Function module (or component) 204 is responsible for collecting client environment information and generating/outputting some adaptation parameters, while the Adaptation Logic module (or component) 206 utilizes these parameters to make Representation selections and decisions.
While rather simple and straightforward, there are some issues with this pure CMAS system. For example, since DASH may not mandate client behavior, there may be no guarantee in presenting a coherent user experience of a same piece of streaming content across devices with different DASH client implementations. This may be undesirable, especially from the perspective of the content owners.
There may be difficulty in regulating the Adaptation Logic module 206 within the client 200 in a dynamic manner, for instance, according to how a service provider wants the content streamed to different classes of subscribers.
Further, it may be difficult to manage the streaming experience that depends on content itself. For example, for some portion of content that has details the content provider really wants the user to see, high-quality segments have to be streamed. This may be hard for the client 200 to maintain this kind of experience, without knowing content segments before requesting them.
Finally, as dynamic adaptation is to be managed by the client 200, the content information at all the levels including potential Periods, Adaptation Sets, Representations and Segments have to be prescribed in an MPD and communicated to the client prior to the time the client starts streaming. This becomes significant and may even be unresolvable when it comes to streaming dynamic events (e.g., emergency alerts); dynamic content, (e.g., live advertisement insertion); irregularly updated content (e.g., a basketball game with irregular time-outs); or a large, or even unlimited, number of potential Representations among which adaptation can happen dynamically (e.g., view angles and regions of interest (ROI) of a live event stream).
There is a need for adaptation in the spatial dimension, allowing the user to navigate content for his/her own interest, for instance, by selecting video content presented in different positions, view angles and regions of interest within an original video content. Moreover, as this kind of user navigation has a lot of freedom and is difficult to prescribe in MPDs, there is also a need for a client-driven but server-managed adaptive streaming system.