A media presentation is usually composed of several media contents such as audio, video or text. They can be sent from a server to a client for being jointly played by the client device. Those media contents are downloaded by the client from a server.
In this context, a new standard called DASH (for “Dynamic Adaptive Streaming over HTTP”) has recently emerged (see “ISO/IEC 23009-1, Dynamic adaptive streaming over HTTP (DASH), Part1: Media presentation description and segment formats”). This standard enables to create an association between a compact description of the content(s) of a media presentation and the HTTP addresses. Usually, this association is described in a file called a manifest file or description file. In the context of DASH, this manifest file is a file also called the MPD file (for Media Presentation Description).
When the MPD file is sent to the client device, the description of each media content can be easily known by the client. By reading the manifest file, the client is aware of the kind of media contents proposed in the media presentation and is aware of the HTTP addresses for downloading the associated media contents. Therefore, it can decide which media content to download (via HTTP requests) and to play (decoding and play after reception of the media data segments).
In addition to this association, the DASH standard proposes to split each media content into small periods of time. The time decomposition is added in the MPD file. Thus it describes the association between HTTP addresses (or URLs) and the compact description of each media content over a small period of time.
The invention focuses on a video description in a manifest file (by taking the DASH MPD as reference). Even if the other elements of the media representation (e.g. audio, text, . . . ) are not directly taken into account, they can easily be incorporated in a more global media description as will be explained below.
The spatial resolution of the video becomes more and more important. In this perspective, 4K2K videos begin to emerge on the market. However, mobile applications cannot display such a resolution with a high quality.
A solution proposes to split the video into tiles. If the user of a mobile application wants to display or focus on sub-parts of the video, only the tiles corresponding to the sub-part are transmitted. This process allows keeping a video portion with a good quality.
In the context of DASH, the known standard “ISO BMFF” (“Base Media File Format”) is used to encapsulate media contents into media data segments in order to form the media presentation.
Classically, by using DASH, each track would be described in the manifest as independent media content, even if the track corresponds to a sub-part of the video. There is no way in the manifest to signal that each track is a sub-part of the same video. Indeed current MPD definition doesn't allow describing tiled video. In practice, the user would have to download a first initialization segment (in addition to the manifest) for knowing that each video described in the MPD is a sub-part of a tiled video. Then they would have to download, as a minimum, the beginning of each first media data segment of each video content to retrieve the association between tile locations and video content. The downloading of this initialization information conducts to delays and additional and useless HTTP requests.
In another prior art which is not compatible with the use of DASH, the article “In packet video 2010, An Interactive Region Of Interest Video Streaming System for Online Lecture Viewing”, A. Mavlankar, P. Agrawal, D. Pang, D. Halawa, N. Cheung, B. Girod, proposes a specific manifest (“proprietary manifest”) describing tiles for a scalable video. The specific manifest provides an identifier and a piece of location information for each tile. From a URL associated to a base layer, and tiles information provided by the proprietary manifest, an HTTP query is built to access a particular tile, this query being associated to a tile index. This type of HTTP query requires processing at server-side to retrieve, from the HTTP query, the byte-range and consequently the tile to be sent to the client device to fulfill its request. This can be done only by a very specific server.