1. Field of the Invention
The present invention relates to techniques for performing a transform between spatial and frequency domains when processing video data. Such transforms are typically performed by both video encoders and video decoders, with a video encoder performing a forward transform to convert a video signal from the spatial domain to the frequency domain, and a video decoder performing a corresponding inverse transform in order to convert the encoded signal from the frequency domain back to the spatial domain.
2. Description of the Prior Art
There are various known transforms for converting signals between the spatial and frequency domains. A commonly used transform is the discrete cosine transform. Contemporary video encoders and decoders may be required to perform video encoding and decoding operations in accordance with a number of video standards, such as MPEG2, MPEG4, H.263, H.264 high profile, VP8, VC-1 and so on. It is known that a particularly computationally intensive part of the video encoding and decoding process is the performance of the transform operation.
Video encoding and decoding has typically been performed on the basis of 8×8 blocks of pixel data, wherein four 8×8 blocks of luma (Y) data and two 8×8 blocks of chroma (Cb and Cr) data represent a given macroblock of the video data. The transform operations are performed on all six 8×8 blocks for each macroblock to produce six transformed output 8×8 blocks.
Until recently, only relatively small transform operations have been needed, such as 8×8 transforms in the above mentioned examples. However, with the introduction of high definition video newer video standards are emerging, such as the HEVC standard, which requires transform operations to be performed on larger arrays, for example 16×16 and 32×32. Many of the techniques developed to efficiently perform the smaller sized transforms have been found not to be scalable to such larger transforms.
Considering specifically the example of a discrete cosine transform (DCT), various papers have studied larger DCTs, and techniques have been developed for enabling such large DCTs to be efficiently implemented by Fast Fourier Transform (FFT) style methods when repeated multiplications are permitted (i.e. the result of one multiplication is fed as an input to a further multiplication). For example the two papers by Feig & Winograd entitled “On the Multiplicative Complexity of Discrete Cosine Transforms”, IEEE Trans Information Theory, Volume 38, No. 4, July 1992, and “Fast Algorithms for the Discrete Cosine Transform”, IEEE Trans Signal Processing, Volume 40, No. 9, September 1992, discuss possible algorithms for optimising DCTs which reduce the number of multiplication operations required. However, generally these techniques require the earlier mentioned repeated multiplications, particularly for the larger transform sizes.
However, in video standards, there is often a requirement for the outputs of at least the decoding operation to be bit exact, since in video processing the contents of certain pictures are predicted from the previous picture. Taking the specific example of the HEVC standard, the inverse transform operation performed during decoding must be implemented to exactly match the output of a reference fixed-point version of the transform using integer multiplies. As a result, the known optimisation techniques that use repeated multiplications (typically in combination with shift operations) cannot be used due to the rounding errors introduced.
A known technique which avoids the need for such repeated multiplications, and hence can be used when bit exact results are required, uses repeated (A+B, A-B) butterflies to reduce the number of multiply operations required. When considering the example of a 32×32 transform, then without any optimisation this would require 32×32 multiplications for each one dimensional transform, i.e. 1024 multiplications. Through the use of such known butterfly techniques, the number of multiplications for that specific scenario can be reduced to 342.
Nevertheless, this is still a significant number of multiplications to perform, and this number of multiplications needs to be repeated for every one dimensional transformation. For example, video encoding and decoding typically uses two dimensional DCTs, and hence by way of example using the HEVC standard, each block of video data to be processed may consist of an array of 32×32 data values. Typically the two dimensional discrete cosine transform is implemented by performing a series of one dimensional transforms applied to each row and each column of the array, and hence in the above example would involve the performance of 32 one dimensional transforms to cover each row of the array, followed by 32 one dimensional transforms to cover each of the columns. Hence, 64 one dimensional transforms will be required for each block of video data, and each one dimensional transform would require 342 multiplication operations in accordance with the specific butterfly technique discussed earlier.
There is a continual desire to provide higher performance and lower area cost video encoders and decoders, and accordingly it would be desirable to reduce the number of multiplications required during performance of forward and inverse transform operations on video data. This desire is becoming more and more acute as the size of the transformations to be supported increases in accordance with the newer video standards such as the HEVC standard.