The present invention relates to the non-linear compression and expanding of video signals to better facilitate application of video information to a medium, which medium may be a storage medium or a transmission medium. The invention also relates to compression and expanding of video signals to minimize distortion effects introduced by bandwidth compression devices designed to minimize data to be transmitted or stored in memory.
When video signals or other signals are applied to a transmission or storage system, the transmission or storage system has the undesirable effect of introducing noise into the signal. In a transmission system or medium, the signal which is received is a combination of the transmitted signal and noise introduced by the process. In somewhat similar fashion, the signal which is recovered from a storage system is a combination of the signal which was applied to the storage system and noise introduced by the process. Various techniques for minimizing the effects of such noise have been used for the storage and/or transmission of different types of signals. Generally, such techniques try to maintain the signal-to-noise ratio (SNR) at a sufficiently high level that noise will not be noticeable.
One common technique to minimize noise problems from transmission or storage is frequency domain signal pre-emphasis applied to a signal prior to its transmission or storage. This technique, which is relatively common for audio signals and has also been used for video signals, takes advantage of the fact that the troublesome noise is usually at high frequencies. The pre-emphasis is an entirely linear method which may be accomplished purely with linear filter operations. The technique relies on the fact that the video signal contains lower amplitude high frequency components than the amplitude of the lower frequency components. The signal is first passed through a filter whose transmission generally increases with an increase in frequency and the output of the filter is then subjected to the transmission or storage. Thus, the higher frequencies of the signal (audio, video, or other) are transmitted or stored at a higher amplitude than would be without the pre-emphasis. After recovery of the signal from transmission or storage, the recovered signal is passed through a de-emphasis linear filter which has the opposite amplitude and phase characteristic from the pre-emphasis linear filter. That is, the result of both filters in tandem is an amplitude response uniform in amplitude and a linear phase-with-frequency characteristic meaning that all frequency components have equal time delay. The de-emphasis filter has the effect of reducing the high frequency noise which was added by the transmission or storage process since the high frequencies of both signal and noise are attenuated to return the signal to its former frequency-amplitude characteristic.
The frequency domain pre-emphasis technique is widely used in audio and video systems, but has several limitations. The amount of pre-emphasis which can be used is usually limited by the increase in signal amplitude which can be tolerated by the transmission or storage system without clipping the signal at its furthest positive and negative signal values. In other words, extreme pre-emphasis would boost the high frequency signal components to such a magnitude that the transmission or storage system could not properly handle them. When the pre-emphasis technique is used for satellite video transmission systems, the amount of pre-emphasis is also limited by the amount of interference introduced into neighboring satellite communication systems.
Another analog technique which has been used is a method called coring. Unlike the frequency pre-emphasis technique, the coring technique does not pre-condition the signal prior to transmission or storage. Moreover, coring is not a linear technique. Coring is based on the principle that the noise added by a transmission or storage system is predominantly high frequency noise of low amplitude relative to the signal itself. Various types of coring techniques may be used, but the general principles will be described. Linear filtering to isolate the high and low frequency signal portions is applied to a signal which is recovered from storage or transmission. Next, a non-linear operation is applied to the separated high frequency signal. This non-linear operation highly attenuates the signal when the signal amplitude is very close to the average value of the high frequency signal, while passing the signal with a linear amplitude characteristic when the high frequency signal departs more than a pre-determined amount from its average value. The resulting modified high frequency signal is then recombined with the low frequency component of the video signal to result in a signal with low amplitude, high frequency components removed. Unfortunately, this coring technique removes high frequency low amplitude signal components in addition to the noise components.
A technique of audio companding (from compress and expand) has been widely used in digital telephony. Audio signals have certain characteristics, some of which characteristics are not shared by video signals, facilitating the use of audio companding. An audio signal always has an average value which is 0. It is also a one dimensional signal whose instantaneous value is a function of time. Further, frequencies below 15 to 20 Hz. are quite rare in most audio signals of interest and frequency content above a few KHz. is of much lower value than the content in the range of about 300 to 3,000 Hz.
The usefulness of audio companding is based on human acoustic perception embodied in Weber's Law, which law is also applicable to visual perception. Weber's Law states that the just noticeable difference in amplitude of two similar signals is directly dependent on the amplitude of the signal. Mathematically, J/I is a constant over a wide range of amplitudes where J is the just noticeable difference in signal amplitude and I is the signal amplitude. In other words, as applied to human acoustic perception, a person would be able to hear a difference in sound between a sound of signal amplitude I and a sound of signal amplitude I+J. At least over a particular range of values for I, a person can hear smaller absolute differences in signal or sound volume for low volume sounds than the person can notice in high volume sounds. A person might hear a difference J.sub.1 corresponding to the sound difference of two different sized pins being dropped. However, the person might not be able to perceive the difference in sound volume of two different bells being rung even though the absolute difference in volume between the two bells may be substantially greater than J.sub.1.
Audio companding takes advantage of Weber's Law by first subjecting an audio signal to a near-logarithmic operation where both negative and positive portions are conditioned symmetrically. Commonly, the initial audio signal x(t) is processed to provide a signal y(t) for transmission as follows: ##EQU1## where Xmax is a positive maximum value which x(t) may reach, sign simply takes the sign or polarity of an instantaneous value, and .mu. is a constant selected for a particular application. For digital telephony, the value of .mu. may be 255.
In a digital system wherein the audio signal x(t) might be expressed in a time series of 12 bit samples (16 to 20 bits for very high dynamic range and high quality systems), a companded signal y(t) might be expressible by only 8 bits per sample and allow recovery of a signal audibly nearly identical to the original. This would therefore provide significant audio data compression as a benefit.
After transmission or storage, the companded audio samples are individually subjected to a non-linear operation complimentary to the operation described by equation one above. The recovered signal after performing the complimentary operation has an amplitude that is virtually linearly related to the input signal prior to companding. The output signal does have output samples with values which are different from the input signal, but only by an amount best expressed as a certain percentage of the input signal. Thus, higher amplitude signals will have absolute errors which are much larger than lower amplitude signals. However, recalling that Weber's Law indicates that the just noticeable difference of sounds heard by the human ear is proportional to the signal amplitude itself, the audio companding system may be designed such that errors introduced are kept below this just noticeable difference. This example exploits Weber's Law for the purpose of data compression without the amplitude compression associated with the companding adding unacceptable noise levels.
Instead of using audio companding simply to realize data compression, such companding may also be used to minimize the perceived noise added to an audio signal by a noisy transmission channel or storage medium. In such a case, the non-linear operation of equation one can be performed prior to transmission, but without any further quantization than was introduced by the Analog-to-Digital Converter operating on the linear input audio signal. The signal can then be passed through a digital-to-analog converter to obtain an analog signal to apply to the transmission or storage medium. After reconstruction at a receiver, the added noise is less apparent in the lower amplitude signals. Although the noise has actually increased in the higher amplitude signals, the signal to noise ratio is still high enough so that the signals are not perceived as noisy. As mentioned above, Weber's Law applies to human visual perception such that the just noticeable difference in luminance between two signals is proportional to the amplitude of the luminance signal. For example, when viewing a video screen one may be able to perceive noise in a darker portion of the screen, whereas noise of greater amplitude may go undetected in a brighter portion of the particular video image. Video systems have exploited this aspect of Weber's Law for many years. Specifically, many electronic cameras already have a "taking" amplitude characteristic which is non-linear and approximated by either a logarithmic curve or a power-law curve wherein the power is a fraction less than unity such that the curve defined by the camera has a curvature roughly similar to the usable part of the logarithmic curve. The slope of this characteristic on a logarithmic scale is called the gamma of the transfer characteristic. In the early days of black and white television, this phenomena was quite useful for two reasons. First, the picture tube has a transfer characteristic between control voltage input to luminance output which is a power-law of about 2.2, and thus roughly complimentary to the taking characteristic of some cameras. The second useful aspect is that transmission noise added to the television luminance signal between camera and receiver display is modified by the picture tube transfer characteristic, due to the non-linear power-law characteristic, such that it is attenuated in the darker image portions and expanded in the brighter image portions. This causes the noise to be more equally perceived by human viewing due to the Weber's Law which says that the just noticeable difference is a higher amplitude in brighter image areas and a lower amplitude in darker image areas. In other words, the non-linear characteristics tend to minimize the degree to which a human will perceive noise introduced in the video signal by its transmission.
A further characteristic of human sensory perception of images is of interest. This characteristic may be roughly termed spatial masking and a description of it may be found in the book Two-Dimensional Signal and Image Processing by Jae S. Lim, published by Prentice-Hall, a division of Simon and Schuster, Englewood Cliffs, N.J. 07632. Reference is made to section 7.3 which, in addition to a discussion of Weber's Law relative to visual human perception, includes a section 7.3.4 which relates to spatial masking. Spatial masking is much more pronounced (i.e., has greater effects) than Weber's Law. Stated simply and generally, spatial masking is the characteristic of human visual perception whereby noise which is noticeable in a local spatial area must be of higher amplitude if the local scene contrast is high, and noticeable noise can be of much lower amplitude for a local scene or image portion where the local contrast is lower. Another way to describe spatial masking is that the signal-to-noise ratio in any local area must exceed some constant value for the noise not to be visible in that area of the image.
An example may be useful in describing spatial masking. If a video image contains a piece of paper having text typed in fine print across its top and being blank at its bottom half, noise is more likely to be noticeable in that part of the image corresponding to the bottom half of the piece of paper. The contrast associated with the text on the top half of the piece of paper tends to mask noise on that part of the image, whereas noise having a magnitude not noticeable on the top half of the piece of paper may be readily noticeable on the part of the image corresponding to the bottom half of the piece of paper.
In the Lim book, it is stated that spatial masking might be used to reduce background noise. In particular, it refers to attempting to reduce background noise by spatial filtering which may involve some level of image blurring. The book further indicates that in high contrast regions, where the effect of blurring due to spatial filtering is more likely to be pronounced, the noise is not as visible, so little spatial filtering may be needed. The book does not go into details, but apparently the spatial filtering would simply involve application of the spatial filtering to signals which are recovered from a transmission or storage medium. Under those circumstances then, the blurring would be caused by effectively discarding signal information in the lower contrast regions along with noise.
In the storage or transmission of signals, the ability to recover a signal very close to the original signal with minimal effects of noise is important. However, there are also other characteristics which are important in various situations such as the ability to transmit a relatively large amount of information within a relatively small bandwidth. In other words, it is advantageous if one can transmit or store information without requiring an especially high capacity transmission channel or high capacity storage medium. A further characteristic desirable for transmission and storage of signals is the ability of the transmission or storage system to handle signals over a relatively wide dynamic range.
Whereas the previous discussion has focused on the impairments to analog signals undergoing either transmission or storage wherein the undesired noise is contributed by the process or medium of transmission or storage, a second mechanism causing noise and distortion is that of digital data compression wherein the compression process itself is lossy and creates the unwanted noise. In this case the impairment is produced by the source in the attempt to reduce the data by factors over a range of five to a few hundred. The subsequent digital transmission or storage in this case is presumed to be without error.
In order to realize the goal of allowing transmission and storage without requiring especially large capacity (bandwidth or other measure of capacity) transmission channels or storage media, while still providing an acceptable dynamic range, various predictive coding, transformation coding, and quantization techniques have been used. Such transformation techniques generally are used to try to minimize the sending or storage of redundant information. Such quantization techniques limit the values which are transmitted or stored. Although such transform (the discrete cosine transform is a common one) and quantization techniques are generally useful by reducing capacity requirements in transmission and storage of signals, such techniques introduce noise and distortion into the transmission and storage processes.