1. Field of the Invention
This invention relates to methods and apparatus for coding and decoding of video signals. Such methods and apparatus may be applied, for example, in digital video compression and decompression processes.
2. Description of the Prior Art
Known video compression techniques may involve spatial (i.e. intra-image) and/or temporal (i.e. inter-image) coding of video signals. One known method of spatial coding involves sub-band frequency separation of video images (i.e. fields or frames of the video signal) in the two-dimensional spatial frequency domain. Frequency separation into the desired number of sub-bands can be achieved using a filter system where filtering in the horizontal image direction is performed independently of filtering in the vertical image direction. For each of the horizontal and vertical filtering processes, the filter system is formed of a series of filter stages arranged in a tree structure, each filter stage comprising complimentary low and high pass filters. A video image supplied to the low and high pass filters at the first filter stage of the horizontal filter system is split into two horizontal frequency bands, a low pass band and a high pass band, which are decimated separately at the output of each filter. Each of these bands may then be supplied to further similar filter stages for further sub-division into low and high frequency components, and the process repeated until the desired number of horizontal sub-bands is obtained. Each sub-band component can then be applied to the vertical filter system for further sub-division into vertical bands, so that eventually the input image is decorrelated into the desired number of spatial frequency components within the two-dimensional spatial frequency domain.
Depending on the arrangement of the filter system, the image may be separated into sub-band components of uniform size or non-uniform size, the frequency separation process in the latter case being known as logarithmic or "wavelet" coding. In either case, the spatial frequency component data may then be subjected to a quantization process where the degree of quantization for the different spatial frequency components depends on the human visual responsiveness at different spatial frequencies, and the resulting quantized data may then be entropy encoded prior to onward transmission, storage, etc. of the compressed data. The decompression process reverses the various stages of the compression process to reconstruct the original image data.
The frequency separation techniques described above are purely spatial coding processes, no account being taken of any redundancy along the temporal axis due to similarities between successive video images. The MPEG (Moving Picture Experts Group) standard developed by the International Organisation for Standardisation defines a standard for, inter alia, video compression and proposes a system involving both spatial and temporal processing. The MPEG standard is summarised in Communications of the ACM, April 1991, Volume 34, No. 4, Pages 47 to 58.
One MPEG scheme uses a frame-based DPCM loop as illustrated schematically in FIG. 1 of the accompanying drawings. According to this scheme, the difference between successive video images is calculated to eliminate some temporal redundancy, and the difference signal is subjected to a spatial coding process. To ensure low entropy in the difference signal, the system utilises motion vector estimation to derive motion vectors representing the motion of the image content of blocks of pixels, or "macroblocks", between successive video images, and the difference is calculated on a block by block basis along the direction of the motion vectors.
Referring to the block diagram of FIG. 1, a pair of successive video frames is supplied to a motion vector generator 1 which, using known motion vector estimation techniques, derives for each macroblock (here an array of 16.times.16 pixels) of the second input frame a motion vector representing the motion (in terms of numbers of pixels in the horizontal and vertical image directions) of that macroblock between the second frame and the first frame. The first frame is then output via a subtracter 2 to a spatial coder 3 which performs a spatial encoding process, the spatially encoded frame being supplied to an output 4a for storage, transmission, further processing, etc. The encoded first frame is also fed back via a decoder 5, which reverses the spatial coding process performed by the coder 3, to a motion compensator 6 which also receives the motion vectors derived by the motion vector generator 1 and supplied to a vector output 4b.
The second frame is to be output next, on a block by block basis, by the motion vector generator 1, and the difference is to be calculated between each block of the second frame and the block in the first frame at the location "pointed to" by the motion vector for that block of the Second frame. Thus, for each successive block of the second frame, the motion compensator 6 uses the motion vector supplied for that block to identify the appropriate block of the first frame which is then output to a compensating delay 7a. Then, as each block of the second frame is output to the subtracter 2, the block of pixels in the first frame at the location pointed to by the motion vector for that block is output by the delay 7a to the negative input of the subtracter 2. These pixels are then subtracted from the corresponding pixels in the macroblock of the second frame supplied to the positive input of the subtracter 2, and the resulting difference values are output to the spatial coder 3. This process is repeated for all macroblocks of the second frame output by the motion vector generator 1. Thus, the difference between the first and second frames is calculated on a block by block basis along the direction of the motion vectors, and the difference values (after spatial coding) are supplied to the output 4a and are also fed back round the loop to one input of an adder 7b. The blocks of pixels of the first frame output by the delay 7a to the subtracter 2 are also supplied to the other input of the adder 7b. Thus, the blocks of pixels of the first frame are added to the difference values for the second frame, thereby reconstructing the second frame which is supplied to the motion compensator 6. The process can then be repeated to obtain the difference between the third frame and the second frame, and so on.
While the MPEG scheme described above achieves low entropy through use of motion compensation in the temporal coding process, there are a number of disadvantages with this scheme. Firstly, the DPCM process is asymmetric, i.e. the process is a causal one, the second output "frame" being calculated from the first, the third from the second, and so on. Errors introduced at any stage of the process will therefore be propagated through all subsequent stages. Further, the DPCM process does not separate frames into temporal frequency bands, so advantage cannot be taken of the temporal characteristics of the human visual system, for example in a quantization process.