A sequence of images shall be considered in the following in a broad meaning because it covers a 3D video sequence, scalable video sequence, a multi-view video sequence, etc. . . . . Consequently, the claimed method may applied to any of kind of sequence of images the illumination variations of which shall be compensated from a reference image.
In a multi-view video sequence, for example, an object or a scene is recorded using a setup of several synchronous cameras from different positions. Each camera records an image which is usually called a view. A multi-view video sequence is thus composed of multiple scenes and several views are recorded for each scene. A view is a set of pixels as an image but the term ‘view’ is used in the following rather than the term ‘image’ to keep in mind that multiple views of a same scene (or object) are embedded in the sequence of images.
Frequently discussed applications for sequence of images include three-dimensional television (3DTV) as well as free-viewpoint television (FTV), where the user is able to navigate freely through the scene.
The recording of a sequence of images creates a large amount of data. Therefore, efficient compression techniques are required to store or transmit video streams.
For example, MVC (Multi-views Video Coding), specified by ISO/IEC 14496-10 ITU-T Rec. H.264, supports the direct coding of the synchronous information from multiple views using a single stream and exploits inter-camera redundancy to reduce the bit rate. The basic coding scheme uses the hierarchical B prediction structure for each view. This scheme utilizes the correlation between images at the same time point but different views for disparity estimation and compensation. Motion compensation techniques that are well-developed for single-view video compression can be used for temporal prediction. Likewise, disparity compensation techniques can be utilized to reduce inter-view redundancies. The compensation process is fulfilled by block matching technique, generally aiming to find the best matching block in the reference image, so that it contributes to minimum residual error after prediction.
There exists illumination change between multiple views of a same scene (or object). Such a change can be classified into two categories: the global illumination change, which is caused by different calibrations between cameras, and the local illumination change, which is caused by the different angles and positions of the cameras.
The reason why the problem has to be dealt with is that it may influence the quality of the image-based rendering algorithms or the accuracy of the disparity estimation and compensation (inter-view prediction in case of video coding). In the last case, the amount of residual energy for the best match candidate will be increased. Furthermore, the searching process for best matching block and the disparity vector will be affected. Both of those further results in decreasing the coding efficiency.
This is also the case for the inter-view matching for depth estimation (3D video) or for the bit-depth scalable video coding or for predicting HDR (High Dynamic Range) videos from LDR (Low Dynamic Range) videos.
To solve this problem, MVC implements an illumination compensation process which uses a weighting prediction tool. The illumination compensation process which predicts a view D from reconstructed samples RecR(.,.) of another view R, is given by equation (1):PredD(x,y)=WR×RecR(x+dx,y+dy)+OR  (1)where WR and OR, are the scaling parameter and offset respectively, which are, in a transmission context, transmitted in a slice header relative to the current view D of the sequence of images. The scaling parameter and the offset are constant for the whole slice.
Such an illumination compensation process, which is linear, is too simple in case of light reflections or flashes for multi-view video coding. Moreover, it does not take into account the local illumination variations in a scene because the same parameters are applied for the whole slice.
The illumination compensation linear process is definitely not adapted in the case of views from heterogeneous captors, e.g. bracketing, or for heterogeneous scalable prediction, e.g. HDR (High Dynamic Range) to LDR (Low Dynamic Range) video conversion.
Fecker et al. (“Histogram-based prefiltering for luminance and chrominance compensation of multiview video”, IEEE, Transactions on circuits and systems for video technology, vol. 18, NO. 9, September 2008) design a cumulative-histogram-based matching process for illumination compensation. Such a cumulative histogram based process for illumination compensation applies the same correction to the entire view, it is especially useful to correct global discrepancies in the luminance and the chrominance. However, such a process for illumination compensation does not take into account the local illumination variations in the scene because the histograms are calculated for the entire view.
In order to take into account the local illumination variations in the scene, the matching process for illumination compensation should be adapted locally. However, if the adaptation is made per block, for instance, one should transmit a matching information per block. This huge amount of information is not efficient for video coding.