1. Field of the Invention
The present invention relates to an inverse transform and sub-sampling method, that is capable of simplifying a complicated inverse transform process, while performing sub-sampling simultaneously; and in particular to a low-complexity method applied to video decompression and frame-size reduction, that is used to simplify a process of transforming inversely frequency-domain coefficients to time-domain data and downsizing these inversely-transformed data by using a coefficient compensation technique, so as to overcome the distortion problem as caused by the simplified process.
2. The Prior Arts
In general, the compression of multimedia data is achieved by means of transform from a spatial domain to a frequency domain, quantization and variable-length coding. The transforms between a spatial domain and a frequency domain commonly utilized are as follows: a discrete wavelet transform, a discrete sine transform, a Discrete Cosine Transform (DCT), or a discrete Fourier transform, etc. Usually, for videos or images, DCT or discrete wavelet transform is adopted and conducted based on a block. Transformations are performed on blocks of videos or images, such that after transform, each transformed block has the information concentrated at its low frequency part, so as to facilitate the subsequent quantization and variable-length encoding. The transform between a spatial domain and a frequency domain usually requires enormous amount of computations at both the encoding and decoding processes. In a decoding process where the decoding speed is essential, the inverse transform process from a frequency domain to a spatial domain will create burden on the decoder. Therefore, a lot of researches have been conducted on this subject with their emphases on the computational manner of the inverse transform. In this respect, various fast inverse transform computations have been proposed, such as, U.S. Pat. No. 5,452,466 (reference document 1), U.S. Pat. No. 5,590,066 (Reference document 2) and U.S. Pat. No. 5,596,517 (reference document 3).
In addition, in certain documents and inventions considering the frequency-domain coefficient distributions, various methods have been proposed to reduce computational complexity in a decoder associated with an inverse transform from a frequency domain to a spatial domain. Through a transform from a spatial domain to a frequency domain, the energy of the transformed coefficients will concentrate on its low frequency portion. Owing to this characteristic, U.S. Pat. No. 5,883,823 (reference document 4) proposed a simplified Inverse Discrete Cosine Transform (IDCT) process at an 8×8-point block, wherein, the IDCT is performed only for 16 low frequency coefficients in the upper left corner of the frequency-domain coefficients and the non-zero coefficients of the remaining 48 coefficients, thus reducing the computational complexity of the IDCT.
In U.S. Pat. No. 6,717,988 (reference document 5), a device is utilized to determine the non-zero coefficient distribution in the frequency-domain coefficients received by a decoder. The larger is the spreading range of the non-zero coefficients in a block, the more complex is the computation in performing an IDCT. Therefore, the computation of IDCT of a decoder can be adjusted due to the result of determination obtained in advance; thus the inverse transform is performed only for coefficients in a specific range, so as to filter out the inverse transform operations associated with coefficients out of this pre-determined range. However, compared with the method of U.S. Pat. No. 5,883,823 (reference document 4), in this patent invention, some of the information associated with non-zero high-frequency coefficients will be lost.
In U.S. Pat. No. 7,142,598 (reference document 6), frame energy is used to determine the process flow of an IDCT, and the frame energy is defined as the mean of the summation of difference between the value of each pixel and the average value of all the pixels in an entire frame; in addition, a block energy is defined as the mean of the summation of difference between value of each pixel in a block and the average value of all the pixels in an entire block. As such, before performing an IDCT for each of the blocks, block energy and frame energy are compared. In case that the block energy is larger than the frame energy, then IDCT for the 16 low-frequency coefficients in the upper left corner of a block is conducted; in case that the block energy is positive and is less than the frame energy, then IDCT for the 9 low-frequency coefficients in the upper left corner of the block is performed, and in case that the block energy is negative and is less than the frame energy, then IDCT for 1 low-frequency coefficient in the upper left corner of the block is computed. In this method, likewise, the information associated with high-frequency coefficients tends to be lost.
In U.S. Pat. No. 7,366,236 (reference document 7), the End-Of-Block (EOB) position (the position of the last non-zero coefficient in a block using a zig-zag scan) is used to understand the coefficient distribution in a block, and an IDCT is only performed for a portion of the coefficients. Wherein, when the EOB point is equal to zero, perform IDCT only for the Direct Current (DC) coefficient; when the EOB point is not equal to zero, and only one of the 64 coefficients is not equal to zero, perform table-lookup and IDCT for the non-zero coefficient; when the EOB point is less than or equal to 14, perform IDCT for the 20 low-frequency coefficients in the upper left corner of the block; when the EOB point is less than or equal to 25, perform IDCT only for the 42 low-frequency coefficients in the upper left corner of the block; and when the EOB point is a number other than those mentioned above, perform IDCT for all 64 coefficients in the block. Through the application of this method, no high-frequency information or low-frequency information will be lost.
In U.S. Pat. No. 7,129,962 (reference document 8), the operation process of IDCT is not implemented through the computation of matrices; it is rather achieved in the following manner: to each spatial-domain datum, all non-zero frequency-domain coefficients are utilized to look up a coefficient table, then additions and multiplications are performed to re-establish each of the spatial-domain data; while the coefficient table is composed of numeric values which are used to multiply the non-zero frequency-domain coefficients while performing IDCT. As such, every eight spatial-domain data are adopted as a unit for parallel processing in expediting the operation speed. Moreover, the operation process associated with the frequency-domain coefficients of zero can be deducted. To compare the reference documents 4 to 8, Table 1 is prepared in the following based on the frequency-domain coefficient distributions used for the simplified inverse transform processes.
TABLE 1Characteristics of inverse transforms from a frequency domain to a spatialdomain for reference documents 4-8.Ref.CharacteristicsDoc.Implementation means(advantages & shortcomings)┌4┘Performing the inverse transform Conserving computation from a frequency domain amount of the inverse to a spatial domain only transform while keeping for 16 low-frequency all the transformation coefficients and non-zero information; yet may need high-frequency coefficients additional computation in the frequency domain.amount in determining coefficient value.┌5┘Determining the manners of May lose a portion the inverse transform from aof non-zero coefficients frequency domain toduring the inversea spatial domain basedtransform process, and result on the coefficient distribution in losing part of information.in the frequency domain.┌6┘Determining whether to perform Reducing computationala complete inverse transform complexity of theprocess based on block energy.inverse transform, yet thesimplified process may result in losing information.┌7┘Determining the manners of the Saving part of computation inverse transform based on time, yet still performing the the EOB point.inverse transform for lots of zero coefficients, increasing computation amount of the inverse transform.┌8┘Completing the inverse Keeping complete transform process intransformation information, parallel processing.shortening computationtime of the inverse transform, yet reduction of computation amountbeing not enough.
Due to the displaying requirement, the video or image decoder usually is accompanied with a frame-size reduction and enlargement function, that requires very complicated computation process, so as to figure out data of the reduced or enlarged frame. Therefore, quite a lot of researches have been conducted in this field, with their emphases on reducing computational complexity of a frame size conversion. In this respect, a video codec is taken as an example for explanation. In a video encoder, a DCT is usually utilized to transform spatial-domain data into frequency-domain data, subsequently, through the processing of quantization and variable-length encoding, thus achieving the efficacy of data compression. In a video decoder, usually, the data before being implemented by an IDCT are referred to as frequency-domain coefficients, and the data after being implemented by an IDCT are referred to as spatial-domain data. The frame size conversion can be classified into two categories depending on the data type to be processed: in case that the frame size conversion is arranged before an IDCT, then the data processed are frequency-domain coefficients, and this conversion is referred to as a frequency-domain frame-size conversion; and in case that the frame size conversion is arranged after an IDCT, then this conversion is referred to as a spatial-domain frame-size conversion. Moreover, in the process of video decoding, when the frame size is reduced, the capacity of the reference frame memory used by motion compensation can be adjusted according to the reduced frame size, so as to achieve the objective of reducing the memory size required.
The spatial-domain frame-size reduction approach is used to sub-sample spatial-domain data after the inverse transform, and the frequency-domain frame-size reduction approach is employed to sub-sample frequency-domain coefficients before the inverse transform. Therefore, the frequency-domain frame-size reduction approach can be adopted to reduce the amount of data processed by IDCT, hereby reducing the overall computation of a decoder.
In addition, refer to reference document 9 (Huifang Sun, “Hierarchical decoder for MPEG compressed video data” IEEE Transactions on Consumer Electronics, vol. 39, no. 3, pp. 559-564, August 1993) for a frequency-domain frame-size reduction method. As illustrated in this method, the basic unit of frame reduction includes 64 frequency-domain coefficients in a block, and the 16 low-frequency coefficients in the upper left corner of its block are preserved, and they are performed by an inverse transform from a frequency domain to a spatial domain, thus obtaining spatial-domain data with a size of 4×4. When the spatial-domain data are transformed into the frequency-domain coefficients, the energy of data in a block will be concentrated on the low-frequency portion in the upper left corner of a block. Therefore, in performing a frame-size reduction, the low-frequency coefficients are preserved for proceeding with the inverse transform from a frequency domain to a spatial domain, thus being able to preserve most of the information associated with the original spatial-domain data. However, other frequency-domain coefficients located at the high-frequency portion of a block may not be zero at all. Therefore, in performing this method of preserving low-frequency coefficients, part of high-frequency information may be lost.
In view of the shortcomings and drawbacks of the prior art, the present invention proposes a method of inverse transform and sub-sampling having low computational complexity, so as to improve the shortcomings of the prior art.