Over the past few years, advances in both camera and image processing technologies not only enable recording in ever higher resolutions, but also enable stitching the output of multiple cameras together, allowing a set of cameras that together record in full 360 degrees in even higher resolutions than 8K×4K.
These developments make it possible to change the way users experience video. Conventionally a broadcast of e.g. a football match comprises a sequence of camera shots carefully aligned and controlled by a director. In such a broadcast stream, each camera movement in the final stream corresponds to a physical alteration to the position, angle or zoom level of a camera itself. High-resolution panorama videos however, enable a user (and/or director) a certain degree of interaction with the video the user (and/or director) is watching (directing) without having to manipulate the camera in a physical sense. Using pan-tilt-zoom interaction, it is possible to extract from the high-resolution panorama video a sub-region of the video a user or director is interested in. This sub-region may be referred to as the region of interest (ROI).
Since in this particular use case a specific user is, at any given instant in time, only watching a subset of the full video panorama, bandwidth requirements can be reduced by sending only the part of the video the user is interested in. There are a number of techniques with which such functionality can be achieved. One of these techniques is the so-called tiled streaming technique, with which the full video panorama is divided into multiple independently encoded videos, whereby the client has multiple decoders allowing it to reconstruct any part of the full video panorama, if necessary by stitching together a number of such independent videos.
In most user scenarios, however, it is not necessary, or even desired, for a user to interact continuously with the video panorama. In the case of a football match for example, one can imagine that a user is only interested in interacting with the video at certain points in time, e.g. when an off-screen foul is made but the director decides to follow the ball instead. Most of the time, however, a user might just want to follow the director's lead. In such cases, using tiled streaming is not the most efficient method of streaming the video. One reason for this is the fact that the temporal and spatial predictions used in the encoding scheme are not optimized, since they are limited by the tile size. A second reason is that since any given set of tiles will almost never exactly comprise the ROI, some additional pixels/macroblocks are send which are not strictly necessary to reconstruct the given ROI.
In the article by Mavlankar, A. et al. “An interactive region-of-interest video streaming system for online lecture viewing,” Packet Video Workshop (PV), 2010 18th International, vol., no., pp. 64-71, 13-14 Dec. 2010 the concept of a ‘Tracking Tile’ is described. This tracking tile is an independently encoded video, which consists of a continuously updated crop of the video panorama. By selecting the tracking tile mode, a user can passively follow a certain moving point of interest in the video panorama (e.g. the position of the ball in a football match), without having to actively navigate around the video panorama himself. One can imagine the position of the Tracking Tile within the video panorama to be controlled by a director similar to how regular broadcasts take place, or even be controlled by an automatic image recognition system.
While the use of a tracking tile provides users with the freedom to navigate around the video panorama and allowing them to passively follow a point of interest, the tracking tile concept described by Mavlankar et. al. does not provide the ability to perform a seamless transition from following a tracking tile to a user navigating through the panorama himself. If one imagines a user watching the Tracking Tile following the ball in a football match, moving across the video panorama, a user might suddenly want to move a little bit more to the left, for example to see the foul being made.
With the tracking tile scheme as described by Mavlankar et. al., seamless transitions are not possible, since the system does not know the location of the tracking tile in the video panorama at a given point in time. The user will therefore have to find the particular location in the panorama himself thereby disrupting the continuous experience. Switching between the tracking-tile mode and the user-control mode takes a significant time resulting in a waste of resources as video tiles are rendered and displayed that the user is not interested in.
Hence, there is a need in the art for improved methods and systems that enable efficient streaming of a ROI of a wide field-of-view image area to a client. Further, the is a need in the art enabling smooth or even seamless switching between streaming a ROI on the basis of a single stream to a client (in a non-tiled mode) and the streaming of a ROI on the basis of one or more separate tile streams to the client (in a tiled mode).