1. Field of the Invention
The present invention relates to image processing, and, in particular, to computer-implemented processes and apparatuses for encoding and/or decoding video signals for storage, transmission, and/or playback.
2. Description of the Related Art
Many conventional video codecs (i.e., encoder/decoder) employ a two-dimensional block transform, such as a discrete cosine transform or a discrete slant transform, as part of the video encoding process. These block transforms are typically used to transform pixels (or pixel differences) from a spatial domain into transformed coefficients in a spatial frequency domain. The resulting transformed coefficients may then be further processed (e.g., using quantization followed by run-length encoding followed by variable-length encoding) to generate an encoded bitstream that represents the original video signals in a compressed format.
Two-dimensional transforms are typically implemented in two different ways. One way is to generate each output (i.e., each transform coefficient) as a function of all of the inputs. For example, for an (8.times.8) transform that transforms an (8.times.8) block of inputs into 64 transform coefficients, each of the 64 transform coefficients may be represented by a different function of the 64 inputs. In practice, the 64 different functions will typically share common subexpressions. In order to make implementation more efficient, each shared subexpression may be performed once and the result stored in temporary storage and then used multiple times for those different functions in which it appears. For discrete slant and cosine transforms, for example, there are six different levels of shared subexpressions (three as the inputs are transformed rowwise and three more as the inputs are transformed columnwise), each level having one or more different subexpressions. This method of performing transforms is computationally intensive. It also requires repeated storage and retrieval of the results of the different subexpressions.
The other common implementation, known as a forward mapping transform, is to process each input completely by generating the contribution of each input to all of the outputs before considering the next input. This implementation requires the use of multiple registers to keep track of the partial outputs as the contributions from the different inputs are accumulated. For an (8.times.8) transform with 64 outputs, even assuming the use of pseudo-SIMD techniques (in which two or more outputs are accumulated in a single register), the forward mapping transform requires more registers than are available in many computer architectures. The alternative is to store these accumulated partial outputs to memory, but this results in excessive memory traffic which causes processing speed to be reduced.
What is needed are video codecs that apply two-dimensional block transforms as part of their video compression processing without the problems of the known techniques. In particular, it is desirable to implement two-dimensional block transforms efficiently, where efficiently means achieving relatively high processing speed with low memory traffic, good memory cache behavior, and few registers.
It is therefore an object of the present invention to provide processes and apparatuses for encoding and/or decoding video images using two-dimensional block transforms without the disadvantages of the prior art.
Further objects and advantages of this invention will become apparent from the detailed description of a preferred embodiment which follows.