The evaluation of the quality perceived by the user of a compressed video sequence is of particular interest when the source code offers the possibility of having a plurality of different spatial and temporal resolutions, also referred to as scalability layers, within the same compressed flow. In a context of broadcasting of such a video flow via an imperfect environment, it is important to be able to evaluate the quality of each resolution independently from the others and thus to measure the impact of interferences caused by the transmission on each scalability layer. Interest also lies in the comparison of the perceived quality between video sequences which present different resolutions. The estimation of the perceptual quality can thus serve to dynamically adapt the compression parameters in order to adapt the transmitted flow in the best possible way to the constraints of the transmission channel and to the needs of the users. This requirement is even more important in the case of multiple users within a heterogeneous network for which each transmission link presents its own constraints.
Existing solutions for estimating video quality are classified into two main groups. A first type of solution is based on a subjective quality evaluation using a panel of users. Subjective solutions have numerous disadvantages, their implementation is complex and costly and above all they do not respond to the specific problem of a dynamic adaptation of a compressed video flow according to the estimation of the perceived quality, which must therefore be generated in an automatic manner.
A second type of solution relates to objective methods. These solutions are most often based on an error measurement between the pixels of the initial sequence and the recomposed pixels, such as the distortion measurement known as PSNR (“Peak Signal to Noise Ratio”). This solution presents insufficient results in terms of correlation with the subjective results. Moreover, solutions based on the measurement of errored pixels are not suitable for comparing two sequences at different resolutions since, from a mathematical point of view, these sequences present a different content, whereas, from a perceptual point of view, the information that they contain is identical. Finally, other solutions are based on a modeling of human vision (“Human Vision System”), which presents better results, but they are not adapted to the constraints linked to temporal/spatial scalability. These techniques are most often adapted to fixed images for a given resolution.
One of the objects of the present invention is to provide a unified solution for the evaluation and comparison of the perceived quality for flows transmitted at different spatial or temporal resolutions which are furthermore subjected to errors or losses. This object is all the more important in the case of a compressed flow offering scalability and for which the choice of the transmission and/or decoding of a particular resolution from a plurality of available resolutions arises. Another object of the present invention therefore consists in proposing a solution allowing the best resolution in terms of visual perception to be chosen from a plurality of possible resolutions. Finally, the present invention also aims to enable, in the case of a compressed flow offering scalability, automatic determination of the best choice of decoding from the available resolutions layers.
For this purpose, the invention proposes a method of estimating and comparing perceived quality applicable to a compressed video flow composed of a plurality of overlapping subsets, each representing a different resolution layer. Moreover, the invention uses an objective metric, based not on the measurement of errored pixels, but taking account of the visual structure of the content of the video sequence.