The present invention relates generally to the processing and storage of video images where sequences of video frames must be processed with relatively little memory, and particularly to a system and method for applying a wavelet or wavelet-like transform to a stream of video frames to perform a spatial decomposition transform, and then applying a time domain wavelet or wavelet-like transform to at least the lower spatial frequency coefficients in groups of the decomposed video frames. The time domain (temporal) transforms are performed using an asymmetric, memory efficient transform, without generating undesirable border effects.
The digital video data stream for even a few minutes of a video program, if uncompressed, will occupy very large quantities of memory. Numerous methods of video data compression have been used and many others described in publications and the like.
The present invention uses the well known data compression capabilities of decomposition transforms, such as wavelet and wavelet-like transforms, in a new way to improve compression of video data streams. In particular, the inventor has found that the low spatial frequency components of a sequence of video frames are highly compressible using a temporal transform, such as wavelet or wavelet-like transform. However, the inventor has found that the high spatial frequency components of a typical sequence of video frames are often much less compressible than the low frequency components. The present invention is designed to make use of these properties of typical video data streams so as to achieve very good data compression, while using reasonable computational resources.
Another aspect of the present invention concerns how to best perform a temporal wavelet or wavelet-like transform on a sequence of video frames. It is not practical to perform a temporal transform on a sequence of video frames of unlimited length due to the working memory required for storing the video frames. Thus, the video frames must be processed in batches or blocks, such as blocks of 4, 8, 16, 32 or 64 sequential frames. A sequence of N/2 interleaved frames may be treated as a sequence of N frames, with the odd and even lines of an interleaved frame being treated as two sequential frames. However, to take full advantage of the temporal properties of a sequence of video frames, the inventor has found that it is advantageous while processing one block of video frames to take into account low spatial frequency properties of the previous block of video frames that continue into the current block. Also, the temporal transform should, ideally, be performed so that upon reconstruction of the video frames, abrupt discontinuities between neighboring video frames (i.e., discontinuities not present in the original video frames) are avoided, while also avoiding the working memory requirements of processing a much longer sequence of video frames than the sequence found in any one block of video frames.
Further, it would be advantageous for the temporal decomposition transform to be compatible with a variety of different spatial decomposition transforms applied to the individual video frames. In other words, regardless of whether the individual video frames are decomposed using DCT, or a wavelet or wavelet-like transform, the temporal decomposition transform should be helpful in improving data compression.
In summary, the present invention is a system and method for compressing and encoding a stream of digital video frames. The system and method receives a sequence of video frames, each video frame containing an array of image data representing an image. A spatial transform module performs a spatial decomposition transform on the individual video frames to generate spatially transformed video frames. Each of the spatially transformed video frames includes a plurality of subbands of data, including at least one low spatial frequency subband of data
A temporal transform module performs a temporal decomposition transform on blocks of the spatially transformed video frames. Each block contains a predefined number of the spatially transformed video frames in a sequence corresponding to the sequence of the corresponding video frames. The temporal transform module applies a temporal decomposition transform to at least one low spatial frequency subband of data in the spatially transformed video frames so as to generate temporally transformed video data. The temporal decomposition transform is an asymmetric transform that extends beyond a current block of spatially transformed video frames to a trailing edge of a previous block of spatially transformed video frames, but does not extend beyond the current block of spatially transformed video frames into a next block of spatially transformed video frames.
A data encoder encodes, for each block of video frames, the temporally transformed video data and the subbands of data, if any, of the spatially transformed video frames in the block to which the temporal decomposition transform was not applied.
In a preferred embodiment, the temporal decomposition transform is a wavelet or wavelet-like decomposition transform. The at least one low spatial frequency subband includes, for each video frame, a plurality of coefficients at positions (i,j). The temporal decomposition transform includes a plurality of transform layers, including first, second and last transform layers. Each of the plurality of transform layers other than the last transform layer produce intermediate coefficients for input to a next transform layer.
An edge data buffer is used to store, for each coefficient in the at least one low spatial frequency subband, at least one intermediate coefficient generated by the temporal decomposition transform when applied to the previous block of video frames. The temporal transform uses the at least one intermediate coefficient stored in the edge buffer, corresponding to each coefficient in the at least one low spatial frequency subband, as input to at least one of the transform layers of the temporal decomposition transform when the temporal decomposition transform is applied to the current block of video frames.