The purpose of an objective video quality evaluation is to automatically assess the quality of video sequences in agreement with human quality judgements or perception. Over the past few decades, video quality assessment has been extensively studied and many different objective criteria have been set.
The effects of the introduction of the temporal dimension in a quality assessment context need to be addressed in a different way. A major consequence of the temporal dimension is the introduction of temporal effects in the distortions such as flickering, jerkiness and mosquito noise. Generally, a temporal distortion can be defined as the temporal evolution or fluctuation of the spatial distortion on a particular area which corresponds to the image of a specific object in the scene. Perception over time of spatial distortions can be largely modified (enhanced or attenuated) by their temporal changes. The time frequency and the speed of the spatial distortion variations, for instance, can considerably influence human perception.
The inventors addressed the effects of the introduction of a temporal dimension, by focusing on the temporal evolutions of spatial distortions.
In the prior art1, a perceptual full reference video quality assessment metric was designed that took into account the temporal evolutions of the spatial distortion. As the perception of the temporal distortions is closely linked to the visual attention mechanisms, the prior art chose to first evaluate the temporal distortion at eye fixation level. In this short-term temporal pooling, the video sequence is divided into spatio-temporal segments in which the spatio-temporal distortions are evaluated, resulting in spatio-temporal distortion maps. Afterwards, the global quality score of the whole video sequence is obtained by the long-term temporal pooling in which the spatio-temporal maps are spatially and temporally pooled. However, the prior work in the area of temporal quality evaluation has a number of disadvantages, for example it cannot well be handled in the following cases:    1) The spatio-temporal segments are composed by tracking a block in more than 20 continuous frames with the help of motion vectors. This is usually not practical, since motion vectors currently are very different from true motion, particularly if errors accumulate over such long sequence.    2) In the scheme, a total of six constant numbers are introduced, whose values are defined by user selection of values which make the scheme's prediction accuracy higher according to the respective dataset (composed by 30 sequences). It is clear that the dataset is not sufficiently large to support the selection of six constant numbers and finally the evaluation of the scheme performance. 1 A. Ninassi, O. Le. Meur, P. L. Callet and D. Barba, “Considering temporal variations of spatial visual distortions in video quality assessment”, IEEE JSTSP, Special issue on visual media quality assessment
This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.