The present invention relates to video quality and, in particular, to an objective assessment of the quality of coded and transmitted video signals.
With the development of digital coding technology savings in transmission and/or storage capacity of video signals have been achieved and a large number of new muti-media video services have become available.
Savings in transmission and/or storage capacity by digital compress technology generally depend upon the amount of information present in the original video signal, as well as how much quality the user is willing to sacrifice. Impairments may result from the coding technology used and limited transmission channel capacity.
Video quality assessment can be split into subjective assessment by human observers providing their subjective opinion on the video quality, and objective assessment which is accomplished by use of electrical measurements.
It is the general opinion that assessment of video quality is best established by human observers which is, however, a complex, costly and time consuming approach. Accordingly, there is a need to develop objective visual quality measures, based on human perception, that can be used to predict the subjective quality of modern video services and applications.
Studies in the framework of the American National Standards Institute (ANSI) and the International Telecommunication Union (ITU) have led to a plurality of algorithms for objective video quality assessment.
As will be appreciated by those skilled in the art, calculation of quality indicators of video signals on a pixel bases, for example, requires a large amount of processing. As disclosed in a conference publication by S. D. Voran xe2x80x9cThe development of objective video quality measures that emulate human perceptionxe2x80x9d, Globocom ""91 conf. publ. vol. 3, pp. 1776-1781, 1991, an important class of disturbing distortions in a video signal are those that destroy, soften, blur, displace, or create edges or signal transitions in the video image.
In a further conference publication by S. D. Voran and S. Woff xe2x80x9cAn objective technique for assessing video impairmentsxe2x80x9d, IEEE Pacific RIM Conference on Communications, Computers and Signal Processing, Proceedings Volume 1 of 2, pp 161-165, 1993, an objective technique is described, which is based on digital image processing operations performed on digitized original and impaired video sequences. The technique implies a features extraction process in which so called impairment measurements of perceptual video attributes in both the spatial and temporal domains are determined. The spatial impairment measurement is based on a Sobel filtering operation or, alternatively, a xe2x80x9cpseudo-Sobelxe2x80x9d operation, in order to enhance the edge content in the video image, and consequently in the spatial impairment measurement. The spatial impairment measurement is based on normalised energy differences of the Sobel-filtered video frames using standard deviation calculations conducted over visible portions of the pixel arrays of the original and impaired video signals. The impairment measurements thus extracted from the original and impaired video sequences are then used to compute a quality indicator that quantifies the perceptual impact of the impairments present in the impaired video sequence. The patent publication U.S. Pat. No. 5,446,492 discloses a similar technique in which the feature-extraction processes on the original and impaired video sequences are carried out at distantly apart source and destination locations. The features extracted from the original video sequence are such that they can be easily and quickly communicated between the source and destination locations via a separate low-bandwidth transmission path, i.e. the bandwidth of the source features is much less than the bandwidth of the original video sequence. To this end the feature-extraction process additionally includes a statistical subprocess which subjects the output of the Sobel filtering operation to a statistical processing, i.e. the computation of the standard deviation of the pixels contained within a so called region of interest for which the video quality is to be measured.
A drawback of these known techniques is the fact that the feature-extraction process is based on standard deviation calculations. One thing and another means that image distortions having contrary effects in the Sobel frames, e.g. blurring vs additional noise or false edges, can not always be detected. A further drawback is that the known techniques use a relative distance measure for the quality of perception, which consequently is sensitive for relative effects of very small size and as such of small visibility.
The present invention aims to provide objective quality measures that can be used to assess the subjective quality of video signals, dealing with the higher levels of cognitive processing which dominate the perception of video quality.
It is a further object of the present invention to provide such measures applicable for standardisation.
It is a still further object of the present invention to provide a method, an arrangement and equipment for objective quality assessment of degraded video signals for measuring the quality of video coding equipment and algorithms, video transmissions and other multimedia video services, and which among other things do not have the above mentioned drawbacks.
These and other objects and features are achieved by the present invention in a method of obtaining quality indicators for an objective assessment of a degraded or output video signal with respect to a reference or input video signal by quantifying the strength of edges or signal transitions in both the input and the output video signals using edge or signal transition detection, which method comprises a first main step of generating image features of the input and output video signals, and a second main step of determining quality indicators from the generated image features, and for the definition of which method the prior art of document U.S. Pat. No. 5,446,492 has been used. The process of quantifying the strength of the edges will hereinafter be referenced by the term edginess.
The method according to the invention includes in the first main step the steps of:
a) detecting edges in the input and the output video signals; and
b) calculating the edginess of the input and the output video signals, providing input and output edge signals; and in the second main step the steps of
c) establishing introduced edges in the output edge signal by comparing the input and output edge signals of corresponding parts of the input and output video signals, introduced edges being edges which are present in the output edge signal and are absent at corresponding positions in the input edge signal;
d) establishing omitted edges in the output edge signal by comparing the input and output edge signals of corresponding parts of the input and output video signals, omitted edges being edges which are present in the input edge signal and are absent at corresponding positions in the output edge signal;
e) obtaining normalised values of the introduced edges relative to the output edge signal adjusted by a first normalisation factor;
f) obtaining normalised values of the omitted edges relative to the input edge signal adjusted by a second normalisation factor;
g) calculating a first quality indicator by averaging the values obtained in step e); and
h) calculating a second quality indicator by averaging the values obtained in step f).
The method according to the invention is based on human visual perception, charactrised in that spatial distortions like the introduction and omission of edges or signal transitions have a great impact on the subjective quality of the video signal. Further, it has been found that the introduction of an edge is more disturbing than the omission of an edge.
This has been taken into account, in the method according to the invention, by obtaining normalised values of the introduced edges and the omitted edges. The introduced edges are normalised with respect to the output edge signal adjusted by a first weighing or normalisation factor an the omitted edges are normalised with respect to the input edge signal adjusted by a second weighing or normalisation factor. Obtaining normalised values according to the present invention is more in line with human perception, which is always relative.
The quality indicators for both the introduced and the omitted edges are subsequently established by calculating mean values of the thus normalised introduced and omitted edges or signal transitions in the output video signal.
For a number of different types of video signals, classified by the amount of motion in the pictures, the quality indicators indicators obtained with the invention are dose to the quality indicators obtained from subjective measurements by human observers.
In a preferred embodiment of the method according to the invention, the proportions of introduced and omitted edges are established from respective polarities of a bipolar distortion signal formed from difference building of aligned, corresponding unipolar input and output edge signals of corresponding parts of the input and output video signals.
The first and second normalisation factors may be fixed or, preferably, set in accordance with the characteristics of the video signals, such as the luminance and chrominance values thereof.
For high luminance values, edge deteriorations are less visible which, in a further embodiment of the invention, is taken into account in that the first normalisation factor comprises a variable part obtained from the maximum characteristic values of the video signals, such as the luminance signal.
Calculation of the edginess can be established in a variety of manners. However, the most straigthforward mathematical formulation is to calculate the norm of the gradient of the video signals. An example hereof is Sobel filtering which has proven to provide reliable results. Depending on how derivates of the video signals are approximated, many variations in the calculation of the edginess are feasible. All these types hereinafter will be referred to as Sobel filtering.
In a preferred embodiment of the invention, in particular wherein the introduced and omitted edges are obtained from a distortion signal formed from aligned input and output edge signals, improved or smeared Sobel filtering provides excellent results. With smeared Sobel filtering, a smearing operator having a width of, for example, 3 pixels is used. By this smearing operation, the effect of misalignment in the formation of the distortion signal is compensated for.
Alignment of the input and output edge signals is required because video sequences processed by a codec or transmitted over a transmission channel, for example, show delays with respect to the original sequence and which vary from picture to picture. If the video sequence contains relative little motion, there is only a little influence on the objective video quality measure. However, with large movements the omission of delay compensation leads to a large mismatch in scene content between original and distorted sequences. This inadvertently increases the computed distortions. To solve the time varying delay problem, known alignment algorithms can be used such as disclosed by ITU-T Contribution COM-12-29, xe2x80x9cDraft new recommendation on multi-media communication delay, synchronisation, and frame rate measurementxe2x80x9d, December 1997.
In practice, in accordance with the invention, the quality indicators are obtained from the luminance and chrominance representations of a colour video signal.
Heuristic optimisation has led to quality indicators obtained from smeared Sobel edge detection wherein for the luminance signals the constant part of the first normalisation factor is in a range between 15 and 30, preferably 20; the constant part of the second normalisation factor is in a range between 5 and 15, preferably 10; and the variable part of the first normalisation factor is in a range between 0.3 and 1, preferably 0.6 times the maximum edge values of the luminance signal of the input and output video signals. For the chrominance signals, the constant part of the first and second weighing factors is in a range between 5 and 15, preferably 10.
From the thus obtained first and second quality indicators of each the luminance and chrominance signals, weighted quality indicators are obtained. For example, using multiple linear regression techniques. For a Mean Opinion Score (MOS) calculated from the weighted quality indicators obtained from the above smeared Sobel filtering and preferred weighing factors, correlation of the calculated MOS and observed MOS from subjective measurements reaches a value of above 0.9 which is required for making reliable predictions.
The best results are obtained from training the method on subjective reference quality data such that the normalisation factors and/or weighing of the quality indicators are optimised.
The invention further provides an arrangement for obtaining quality indicators for an objective assessment of a degraded or output video signal with respect to a reference or input video signal by quantifying the strength of edges or signal transitions in both the input and the output video signals using edge or signal transition detection, which arrangement comprises means for generating image features of the input and output video signals and means for determining quality indicators from the generated image features, for the definition of the arrangement the document U.S. Pat. No. 5,446,492 has been used. The arrangement according to the invention includes in the means for generating image features:
a) means for detecting edges in the input and the output video signals; and
b) means for calculating the edginess of the input and the output video signals, providing input and output edge signals; and in the means for determining quality indicators;
c) means for establishing introduced edges in the output edge signal by comparing the input and output edge signals of corresponding parts of the input and output video signals, introduced edges being edges which are present in the output edge signal and are absent at corresponding positions in the input edge signal;
d) means for establishing omitted edges in the output edge signal by comparing the input and output edge signals of corresponding parts of the input and output video signals, omitted edges being edges which are present in the input edge signal and are absent at corresponding positions in the output edge signal;
e) means for obtaining normalised values of the introduced edges relative to the output edge signal adjusted by a first normalisation factor;
f) means for obtaining normalised values of the omitted edges relative to the input edge signal adjusted by a second normalisation factor;
g) means for calculating a first quality indicator by averaging the values obtained in step e); and
h) means for calculating a second quality indicator by averaging the values obtained in step f).
In a preferred embodiment, the edge detection and calculation means comprise improved or smeared Sobel filter means.
Those skilled in the art will appreciate that the means mentioned above under a) and b) can be physically combined or provided by a single means for both the input and output video signal using appropriate multiplexing means, for example. Likewise, means c) and d), and/or means e) and f) as well as means g) and h) may be combined or separate.
The arrangement as a whole can be implemented in suitable digital processor means and incorporated in an Application Specific Integrated Circuit (ASIC), for use in measuring the quality of video codecs and the quality of video transmissions, for example.
The above and other features and advantages of the present invention will be readily apparent to one of ordinary skill in the art from the following written description when read in conjunction with the drawings in which like reference numerals refer to like elements.