Video data is typically composed of a series of still images which are shown rapidly in succession as a video sequence to give the idea of a moving image. Video applications are continuously moving towards higher and higher resolution. A large quantity of video material is distributed in digital form over broadcast channels, digital networks and packaged media, with a continuous evolution towards higher quality and resolution (e.g. higher number of pixels per frame, higher frame rate, higher bit-depth or extended colour gamut). This technological evolution puts higher pressure on the distribution networks that are already facing difficulties in bringing HDTV resolution and high data rates economically to the end user.
Video coding techniques typically use spatial and temporal redundancies of images in order to generate data bit streams of reduced size compared with the video sequences. Spatial prediction techniques (also referred to as Intra coding) exploit the mutual correlation between neighbouring image pixels, while temporal prediction techniques (also referred to as INTER coding) exploit the correlation between images of sequential images. Such compression techniques render the transmission and/or storage of the video sequences more effective since they reduce the capacity required of a transfer network, or storage device, to transmit or store the bit-stream code.
An original video sequence to be encoded or decoded generally comprises a succession of digital images which may be represented by one or more matrices the coefficients of which represent pixels. An encoding device is used to code the video images, with an associated decoding device being available to reconstruct the bit stream for display and viewing.
Common standardized approaches have been adopted for the format and method of the coding process. One of the more recent standards is HEVC, in which a video image is partitioned into smaller portions of pixels known as Coding Units (sometimes referred to as macroblocks or blocks of pixels). The coding units can be partitioned and adjusted according to the characteristics of the original image segment under consideration. This allows more detailed coding of areas of the video image which contain relatively more information and less coding effort for those areas with fewer features.
Images of video sequence may include regions of the image often referred to as a region of interest (ROI). Some video data provider systems provide video data of video sequences provided with ROI functionalities enabling a remote user to access one or more ROIs which may be predetermined or determined according to the request of the user. Such systems typically comprise a video server associated with a database of correlated video sequences. One such video sequence may comprise an overview video sequence with clickable subparts (ROIs). Other video sequences in the database may correspond to one or more regions of interest of the first video sequence in a higher resolution, for example, also referred to as zoomed versions. Such videos sequences are often referred to as ROI videos. One application of interactive regions of interest may be for instance a preview of a video provided to a client that can select one region of interest in order to obtain higher quality video data for the selected region.
One concern in the transmission of video data having one or more ROIs is how to transmit video data corresponding to a region of interest to a user with a low overhead.