Video coding is a way of transforming a series of video images into a compact digitized bitstream so that the video images can be transmitted or stored. An encoding device is used to code the video images, with an associated decoding device being available to reconstruct the bitstream for display and viewing. A general aim is to form the bitstream so as to be of smaller size than the original video information. This advantageously reduces the capacity required of a transfer network, or storage device, to transmit or store the bitstream code. To be transmitted, a video bitstream is generally encapsulated according to a transmission protocol that typically adds headers and check bits.
Streaming media data over a communication network typically means that the data representing a media presentation are provided by a host computer, referred to as a server, to a playback device, referred to as a client device, over the communication network. The client device is generally a media playback computer implemented as any of a variety of conventional computing devices, such as a desktop Personal Computer (PC), a tablet PC, a notebook or portable computer, a cellular telephone, a wireless handheld device, a personal digital assistant (PDA), a gaming console, etc. The client device typically renders a streamed content as it is received from the host (rather than waiting for an entire file to be delivered).
FIG. 1 illustrates an example of an information technology architecture 100 for streaming media data from a server 105 to a client device 110 over a communication network 115.
A media presentation generally comprises several media components such as audio, video, text, and/or subtitles that can be sent from a server to a client device for being jointly played by the client device. Those media components are typically encoded individually into separate media streams and next, they are encapsulated into multiple media segments, either together or individually, and sent from a server to a client device for being jointly played by the latter.
A shared practice aims at giving access to several versions of the same media component so that the client device can select one version as a function of its characteristics (e.g. resolution, computing power, and bandwidth). According to the existing proprietary solutions each of the alternative versions is described and media data are segmented into small temporal segments.
In the context of the dynamic and adaptive streaming over HTTP, a new standard called DASH (Dynamic Adaptive Streaming over HTTP) has recently emerged from the MPEG standardization committee (“ISO/IEC 23009-1, Dynamic adaptive streaming over HTTP (DASH), Part1: Media presentation description and segment formats”). This standard enables association of a compact description of the media content of a media presentation with HTTP Uniform Resource Locations (URLs).
Such an association is typically described in a file called a manifest file or a description file. In the context of DASH, this manifest file is an XML file also called the MPD file (Media Presentation Description).
By receiving an MPD file, a client device gets the description of each media content component. Accordingly, it is aware of the kind of media content components proposed in the media presentation and knows the HTTP URLs to be used for downloading the associated media segments. Therefore, the client device can decide which media content components to download (via HTTP requests) and to play (i.e. to decode and to play after reception of the media segments).
In addition to such an association, the DASH standard proposes to split each media content as a function of periods of time. The time decomposition is described in the MPD file. Accordingly, the latter defines the association between HTTP URLs and the compact description of each component from media content over each period of time. Each media content component can be encapsulated into multiple independent media segments corresponding to these periods of time.
This standard allows a client to download desired media content components of a media presentation over desired periods of time.
The encapsulation file format used for streaming media content components within media segments in MPEG DASH may conform to the ISO Base Media File Format defined in the context of the MPEG standardization activity. In particular, the encapsulation file format may relate to the standardization of the encapsulation of the High Efficiency Video Coding (HEVC) and its scalable extension in the ISO Base Media File Format (ISO/IEC 14496 Part 15), especially when using HEVC tiles for Regions-of-Interest (ROIs) and more generally for spatial access in compressed videos.
It is to be noted that extraction/streaming and displaying of regions of interest relying on tile composition is particularly useful for enabling interactive high quality zoom-in functions during streaming, for example for tracking a particular object that is represented in the images.
FIG. 2 illustrates modules of a server, for example of the server 105 represented in FIG. 1, that is configured for providing a video stream.
As illustrated, server 105 comprises video source 200, for example a camera, producing sequences of images that can be encoded by encoding module 205. An object recognition module 210 is used to identify and locate one or several objects that can be tracked through a sequence of images to define regions of interest. These objects can be determined by image analysis after encoding or prior to encoding (as suggested with dotted line).
The detected objects move in the images over time and thus, potentially overlap at some point in time, appear or disappear.
It is to be noted that the object recognition or detection step can be carried out offline, before the transmission of the video through a communication network, or online during encoding of the images issued by the video source and transmission of the encoded images.
As illustrated, the encoded images are segmented into segments in segmentation module 215 before being possibly transmitted, depending on client requests, through a communication network via communication module 220.
According to the illustrated example, each detected object is associated with a region of interest that consists in one or more tiles in an HEVC encoded video, also referred to as partitions. Therefore, object/tile coverage module 225 is used for providing a set of tiles covered by the region of interest associated with one, some or all of the detected objects.
Manifest generation module 230 is used to generate a manifest that is transmitted to a client device for accessing video segments.
Since the generated manifest comprises a description of all the required adaptation sets (i.e. one for the main video stream and one for each of the region of interest), the manifest comprises redundant data.
Generally speaking, a main feature of the manifest based streaming methods is directed to decomposing media contents in small temporal entities referred to as segments. The manifest then provides the list of HTTP URLs for all the segments or at least a construction rule for these URLs (e.g. a segment template in the DASH standard).
DASH segment templates can be used to set a generic URL usable to address or request media segments from alternative representations. This is convenient for generating MPD of compact size as well as for live streaming when the description of the whole presentation cannot be written in advance (i.e. at MPD transmission time).
However, template rules are limited to a set of pre-defined parameters that are resolved from the MPD itself. For the sake of illustration and according to the DASH standard, templates can use the identifier and/or the bandwidth attributes of the representation element. Accordingly, the possible values to build a segment URL are those taken in the different representations declared in the MPD.
U.S. Pat. No.2014/0156865 discloses “generic substitution parameters” in DASH. The main idea is to declare one or more parameters (as elements) in the streaming manifest for which the values can be determined either from the manifest itself or by external means. The possibility to reference a remote element enables resolution of parameter values after having generated a manifest.
This can be useful, for example in case of manifest-based live streaming where characteristics of the content are not known when the manifest is to be generated. According to the disclosed solution, HTTP URLs are defined in the ‘href’ attribute of an xlink element (xlink is defined by W3C specification: XML Linking Language at http://www.w3.org/TR/xlink11/). Such a HTTP URL references a location outside the manifest. Therefore, the use of Xlink makes possible the resolution of the parameter either when MPD is loaded by a client device or when the element requiring the parameter is selected.
However, although the solution disclosed in U.S. Pat. No. 2014/0156865 may enable determination of the value of a parameter from the manifest itself or by external means, it does not allow dynamic re-evaluation of some parameters external to an MPD.
It is to be noted that an MPD update can be sent if the number of representations changes over time. In such a case, an MPD update comprising possible representations with identifier and bandwidth values usable in a URL template is sent to the client device. However, the MPD update mechanism generally introduces latency and requires client processing (to monitor the moment at which an MPD update should be requested, to parse the MPD update, to compute media selection decisions, etc.).
In view of the preceding, there is a need to improve coding of media presentation description data to make the template mechanism more dynamic so as to avoid the need for MPD updates.
In particular, since the number of detected objects or the number of detected objects that may be considered as an object of interest is generally not known when generating the media presentation description data and/or since this number varies over time, there is a need for enabling dynamic description of objects or of regions of interest in a streamed media presentation.
There is also a need for enabling a client device in a streaming system to track and to focus on objects in a video to address the corresponding media segments for a selected object.