Over the past few years, advances in both camera and image processing technologies not only enable recording in ever higher resolutions, but also enable stitching the output of multiple cameras together, allowing a set of cameras that together record in full 360 degrees in even higher resolutions than 8K×4K. These developments make it possible to change the way users experience video. Conventionally a broadcast of e.g. a football match comprises a sequence of camera shots carefully aligned and controlled by a director. In such a broadcast stream, each camera movement in the final stream corresponds to a physical alteration to the position, angle or zoom level of a camera itself. High-resolution panorama videos however, enable a user (and/or director) a certain degree of interaction with the video the user (and/or director) is watching (directing) without having to manipulate the camera in a physical sense. Using pan-tilt-zoom interaction, it is possible to extract from the high-resolution panorama video a sub-region of the video a user or director is interested in. This sub-region may be referred to as the region of interest (ROI).
Since in this particular use case a specific user is, at any given instant in time, only watching a subset of the full video panorama, bandwidth requirements can be reduced by sending only the part of the video the user is interested in. There are a number of techniques with which such functionality can be achieved. One of these techniques is the so-called tiled streaming technique, with which the full video panorama is divided into multiple independently encoded videos, whereby the client device, also referred to as client, has multiple decoders allowing it to reconstruct any part of the full video panorama, if necessary by stitching together a number of such independent videos.
WO2012/168365 describes content delivery systems, e.g. CDNs, for streaming spatially segmented content to clients. After requesting multiple tile streams from the network, the client (i.e. the client device) needs to buffer the different streams and multiple instances of the decoder need to be started. The client should be able to synchronize the decoders and to stitch the decoded video tiles into the full video. Hence, when switching to a tiled streaming mode comprises a large number of tile streams, the client processes may become complex and resource intensive.
Another form of tiled streaming is known from the HEVC standard, which provides a very efficient encoding and decoding scheme for video data. HEVC tiles were originally introduced in the HEVC standard for decoding of the video data using multi-core processors so that tiles in a HEVC-tiled video stream may be processed (encoded/decoded) in parallel.
Besides parallel processing, HEVC-tiles may also be used for playout of only a subset of the HEVC tiles in the video frames of a HEVC-tiled stream. The subset may e.g. relate to a region-of-interest (ROI) in the image area of the (raw) panorama video.
In that case, the HEVC tiles should be independently encoded so that the decoder is able to decode only a subset of the HEVC tiles. In order to generate such sets of independently decodable HEVC tiles, the HEVC standard allows an HEVC encoder to be configured for restricting the spatial and temporal predictions in the video coding (e.g. motion vectors and in-loop filters) within the boundaries of one or more HEVC tiles.
The absence of spatial and temporal decoding between the tiles (that is between the video data of the tiles) however would introduce a reduced compression efficiency, which could lead to a loss in video quality or an increase in the bitrate.
Hence, in order to achieve high compression rates one would require division of the frames into a few relatively large tiles. Reduction of the amount of tiles however would reduce the amount of parallelism that can be achieved thereby limiting the encoding and decoding speed. When dividing the frames of a video into a large number of small tiles, a high level of parallelism could be achieved however the compression efficiency would be substantially reduced.
Furthermore, when managing multiple independent HEVC tiles at transport level one could format the video data as a single HEVC-tiled stream. In that case however the video data of all HEVC-tiles should be transmitted to the client and tiles can only be manipulated at decoder level. Alternatively, one could format the multiple independent HEVC tiles as separate streams so that only a subset of HEVC tiles needs to be streamed to the client. Such scheme would introduce a large number of HTTP requests in order to request all temporal segments of the desired set of HEVC tiles.
Hence, there is a need in the art for improved methods and systems for streaming HEVC-tiled video data. In particular, there is a need in the art for methods and systems for streaming HEVC-tiled video data that reduces the amount of network traffic and does not increase the processor load of the device.