The technological context of the invention is the W3C HTML5 framework and also adaptive HTTP streaming (for example: 3GPP/MPEG/DASH, Apple HLS, Microsoft Smooth Streaming) to interact with videos on the Web. Currently these streaming approaches focus on temporal access, scalability layers access or multi-view access.
The streaming solutions mentioned above are client-based also called pull-based approaches and rely on the transmission from a server of a manifest file that describes a media resource in terms of segments to download. After processing the manifest, the client is then able to request (polling requests) the segments of interest (either a given scalability layer, a specific temporal interval . . . ). The manifest content depends on how the media resource to be streamed is organized. The organization of the media resource is also called encapsulation and consists in indexing the media resource, for example in terms of scalability layers and/or audio vs. video tracks, in terms of temporal fragments, etc. . . .
One of the most famous encapsulation formats is the mp4 format, the MPEG-4 Part 12 standard, also called ISO Base Media File Format. In 2010-2011, following the standardization of 3GPP/MPEG/DASH, this file format has been updated to comply with dynamic adaptive streaming on HTTP.
Typically, the notion of segments has been defined with specific headers for the indexation of segments (and even sub-segments).
A media resource such as video can be represented by a list of URLs. It enables the client to be informed on the addressable parts of the video. Each addressable part, typically a temporal segment, is described by one URL (for “Uniform Resource Locator”). This URL points to an mp4 segment file or to a byte range inside an mp4 file. This can be used for adaptation purpose, and/or to progressively download the video by successively downloading the consecutive temporal segments. To this end, the pull mechanism has been elaborated in which the client performs a list of HTTP requests on consecutive segments to be able to watch the full video.
The invention focuses on some specific applications which allow interacting with clickable videos on the Web. For instance they allow users to click onto a video frame area to zoom in on a particular region of interest. The invention concerns in particular the zooming application for compressed video, more specifically how to retrieve the video frame portions corresponding to the zoomed region of interest, in this technological context of HTTP streaming. This requires spatial access into compressed videos and a solution to package the videos to be streamed in a way that these spatial areas can be addressed by streaming clients.
The solution has to take into account that the number of polling requests from the client should be kept small, so that the streaming remains efficient in term of latency and overhead (the more requests/responses there are, the more header overhead information will be exchanged and processed). In addition, the solution has to limit data overhead transmitted on the network (i.e. the ratio between requested data by the client and streamed data sent by the server corresponding to the zoomed region).
The U.S. Pat. No. 7,870,575 (Boeing) discloses a media distribution system in which media programs are segmented into a plurality of media segments depending on some attributes values. Those attribute values concern for example movie rating, classification (adult only, all public . . . ). A description file provides a mapping between attributes values and media segments for conditional access.
However, the Boeing solution cannot be used for solving the problem mentioned above for many reasons.
Firstly, the Boeing solution is not intended to limit the number of client HTTP requests, since the context is content broadcasting rather than client-based streaming.
Secondly, attributes are additional metadata which excludes portions of a frame of the video. Indeed the Boeing patent does not provide a spatial access method. Consequently, the Boeing patent does not propose any solution for transmitting these frame portions since their solution only consists in sending temporal segments of the full video, those segments being associated to the same selected attribute.