This invention relates to signal processing and more particularly to encoding of signals for efficient transmission and storage.
The processing of signals for transmission often includes sampling of the input signal, quantizing the samples and generating a set of codes that represent the quantized samples. Most signals of interest (e.g., such as in speech or video signals) are highly correlated, which means that the signal can be thought of comprising a predictable component and an unpredictable component. Coding compression is achieved by encoding essentially only the unpredictable component. Moreover, since these signals are often destined to be received and perceived by humans, concepts that relate to the human perception of the information received have been employed to further compress the coding of such signals and, consequently, the rate of the transmitted signals.
In connection with both speech and video signals, the prior art coding approaches that most closely relate to this invention are transform coding and linear predictive coding.
In a communications system utilizing transform coding, the signal is divided into segments. The segments are sampled and the samples of a segment are transformed into a set of frequency domain transform coefficients. The coefficient signals are then quantized and applied to the transmission channel. In systems that account for noise perception characteristics, the quantization mode applied to the coefficients is made to depend on the signal characteristics and on the sensitivity of the recipient to the resulting quantization noise, achieving thereby coding efficiency. Superimposed on those considerations is the limited bandwidth that is available. Bit allocation is one approach for handling the available bandwidth. In this approach, bits are allocated to the encoding of the transform coefficients in a manner that attempts to achieve a constant bandwidth. Examples of transform coding are found, among others, in U.S. Pat. No. 4,949,383, U.S. Pat. No. 4,184,049, an article by J. D. Johnston titled "Transform Coding of Audio Signals Using Perceptual Noise Criteria", IEEE Journal on Selected Areas in Communications, Vol. 6, No. 2., February 1988, etc.
Linear predictive coding in the speech environment dates back to the mid 1960's. The article by B. S. Atal and M. R. Schroeder titled "Predictive Coding of Speech Signals", Proceedings of the 1967 Conference on Communications and Processing, Cambridge, Mass., pp 360-361, is an early example of that. Later, it has been recognized that predictive coding may be improved by taking account of the not unlimited ability to perceive noise. For example, the article by M. R. Schroeder, B. S. Atal and J. L. Hall titled "Optimizing Digital Speech Coders by Exploiting Masking Properties of the Human Ear", Journal of the Acoustical Society of America, December 1979, pp 1647-1652, describes the benefits that may accrue from considering the perceptual characteristics of the human ear.
In Linear Predictive Coding (LPC) that accounts for the perception of noise, a signal segment is predicted from historical information, and an error signal is derived by subtracting the predicted signal from the actual signal. The error signal is typically transformed and weighted by a noise-perception frequency-sensitive function, to result in a modified transform. The modified transform is encoded and transmitted to the receiver.
In the area of video signals the situation is not dissimilar. For example, sub-band coding was applied to image signals by J. Woods and S. D. O'Neil in "Sub-Band Coding of Images", IEEE ASSP, Vol 34, No. 5, October 1986, pp 1278-1288. The arrangement proposed by Woods et al. divides the image into two-dimensional frequency bands and the signal of each band is compressed via DPCM. Two-dimensional frequency bands, in effect, measure the signal variability in the two dimensions that form the image. Vector quantization of video is described, for example, in "Sub-Band Coding of Images Using Vector Quantization" by P. H. Westerink et al., Proc. of Seventh Benelux Information Theory Symposium, pp. 143-150, 1986; and in U.S. Pat. No. 4,811,112 issued to C. W. Rutledge on Mar. 7, 1989. The "human visual system" (HVS) characteristics were incorporated by K. N. Ngan, et al. in an article titled "Cosine Transform Coding Incorporating Human Visual System Model", SPIE Vol 707, Visual Communications and Image Processing (1986) pp. 165-171. The system described by Ngan et al. basically executes a two-dimensional cosine transform on the source information and weights the derived coefficients in accordance with an HVS function. The weighted coefficients are then quantized, coded and sent to a buffer prior to being applied to the transmission medium. To insure a desired global bit rate, a buffer fullness indication is fed back to the quantizer to control the number of bits that are generated by the quantizer. More recently, is a co-pending application Ser. No. 07/350435, filed May 4, 1989, J. D. Johnston and R. J. Safranek disclosed a sub-band analysis method where the quantization schema for each pixel is adapted so that the amount of quantizing nose that is produced is near, but below, the limit of perceptibility. By allowing the quantization noise to rise while still keeping it below perceptibility, greater compression of the signal is achieved.
The above-described coding approaches operate with sampled and quantized signals. To achieve a more compressed code, prior art approaches typically transform the signal to the frequency domain and thereafter operate in that domain. Given a fixed bandwidth, they allocate the available bits between the different frequency components to do as good a job as possible on all of the frequency components or on a prespecified number of them. In other words, the decision that is made is how well to encode the frequency coefficients; not whether to encode them in the first instance. The result is an encoding schema that is more complex than necessary and, when the total bit rate is constrained, is perceptually suboptimal.