1. Technical Field
This invention relates to the analysis of the quality of video signals. It has a number of applications in monitoring the performance of video transmission equipment, either during development, under construction, or in service.
2. Related Art
As communications systems have increased in complexity it has become increasingly difficult to measure their performance objectively. Modern communications links frequently use data compression techniques to reduce the bandwidth required for transmission. When signals are compressed for more efficient transmission, conventional engineering metrics, such as signal-to-noise ratio or bit error rate, are unreliable indicators of the performance experienced by the human being who ultimately receives the signal. For example, two systems having similar bit-error rates may have markedly different effects on the quality of the data (sound or picture) presented to the end user, depending on which digital bits are lost. Other non-linear processes such as echo cancellation are also becoming increasingly common. The complexity of modern communications systems makes them unsuitable for analysis using conventional signal processing techniques. End-to-end assessment of network quality must be based on what the customer has, or would have, heard or seen.
The main benchmarks of viewer opinion are the subjective tests carried out to International Telecommunications Union standards P.800, “Methods for subjective determination of transmission quality”, 1996 and P.911 “Subjective audiovisual quality assessment methods for multimedia applications”, 1998. These measure perceived quality in controlled subjective experiments, in which several human subjects listen to each signal under test. This is impractical for use in the continuous monitoring of a network, and also compromises the privacy of the parties to the calls being monitored. To overcome these problems, auditory perceptual models such as those of the present applicant's International Patent Specifications WO 94/00922, WO95/01011, WO95/15035, WO97/05730, WO97/32428, WO98/53589 and WO98/53590 are being developed for measuring telephone network quality. These are objective performance metrics, but are designed to relate directly to perceived signal quality, by producing quality scorings similar to those which would have been reported by human subjects.
The prior art systems referred to above measure the quality of sound (audio) signals. The present invention is concerned with the application of similar principles to video signals. The basic principle, of emulating the human perceptual system (in this case the eye/brain system instead of the ear/brain system) is still used, but video signals and the human visual perceptual system are both much more complex, and raise new problems.
As with hearing, the human visual perception system has physiological properties that make some features present in visual stimuli very difficult or impossible to perceive. Compression processes, such as those established by JPEG (Joint Pictures Expert Group) and MPEG (Motion Pictures Expert Group) rely on these properties to reduce the amount of information to be transmitted in video signals (moving or still). Two compression schemes may result in similar losses of information, but the perceived quality of a compressed version of a given image may be very different according to which scheme was used. The quality of the resulting images cannot therefore be evaluated by simple comparison of the original and final signals. The properties of human vision have to be included in the assessment of perceived quality.
It is problematic to try and locate information from an image by mathematical processing of pixel values. The pixel intensity level becomes meaningful only when processed by the human subject's visual knowledge of objects and shapes. In this invention, mathematical solutions are used to extract information resembling that used by the eye-brain system as closely as possible.
A number of different approaches to visual modelling have been reported. These are specialised to particular applications, or to particular types of video distortion. For example, the MPEG compression system seeks to code the differences between successive frames. At periods of overload, when there are many differences between successive frames, this process reduces the pixel resolution, causing blocks of uniform colour and luminance to be produced. Karunasekera, A. S., and Kingsbury, N. G., in “A distortion measure for blocking artefacts in images based on human visual sensitivity”, IEEE Transactions on Image Processing, Vol. 4, No. 6, pages 713-724, June 1995, propose a model which is especially designed to detect “blockiness” of this kind. However, such blockiness does not always signify an error, as the effect may have been introduced deliberately by the producer of the image, either for visual effect or to obliterate detail, such as the facial features of a person whose identity it is desired to conceal.
If the requirements of a wide range of applications, from high definition television to video conferencing and virtual reality, are to be met, a more complex architecture has to be used.
Some existing visual models have an elementary emulation of perceptual characteristics, referred to herein as a “perceptual stage”. Examples are found in the Karunasekera reference already discussed, and Lukas, X. J., and Budrikis, Z. L., “Picture Quality Prediction Based on a Visual Model”, IEEE Transactions on Communications, vol. com-30, No. 7, pp. 1679-1692 July 1982, in which a simple perceptual stage is designed around the basic principle that large errors will dominate subjectivity. Other approaches have also been considered, such as a model of the temporal aggregation of errors described by Tan, K. T., Ghanbari, M. and Pearson, D. E., “A video distortion meter”, Informationstechnische Gesellschaft, Picture Coding Symposium, Berlin, September 1997. However, none of these approaches addresses the relative importance of all errors present in the image.