In a conventional standard moving image encoding method such as ITU-T H.26x and MPEG series, DCT (discrete cosine transform) is used as a means to decrease spatial redundancy. Generally, when an image signal is represented in a spatial frequency domain, electric power has a tendency to concentrate at a low frequency level. The DCT performs an orthogonal transform for a block formed of 8×8 pixels in an image signal space, decomposes an image signal of a source image into a predetermined combination of bases, and obtains coefficients of the bases. The DCT has a characteristic of increasing the coefficient values, that is, a degree of bias with respect to a frequency component. Since the DCT especially concentrates the bias on a low frequency level that plays an important role in vision, the DCT can enhance compression efficiency by performing an adaptive bit distribution.
On the other hand, when encoding is performed at an extremely low bit rate, a resulting coarse quantization degrades a reconstruction of the coefficients. Consequently, there arise some problems in that it is impossible to reconstruct important bases to an intrinsic signal representation. Also, since the DCT operates a closed process on an 8×8 image block, the DCT has the tendency that a distortion caused by quantization noticeably appears in a boundary of blocks. That generates a block distortion and exhibits in the image an element that the original signal does not contain visually, whereby the element is recognized as a seriously noticeable noise.
A large number of bases are required to faithfully reconstruct a steep luminance fluctuation such as a step edge and a portion of waveform having a random pattern. In general, when a weight with respect to vision is considered, a code assignment for a coefficient corresponding to a high frequency level is weighted less than a low frequency level. As a result, the coefficient in the high frequency, which plays an important role in reconstructing the waveform, is lost. The loss of the coefficient causes harmful noise peculiar to the DCT and results in image quality degradation.
In order to overcome such a problem that the DCT entails in a high compression, a method such that a code representation thereof is free from a block structure is proposed. For example, the paper “Very Low Bit-rate Video Coding Based on Matching Pursuits” (R. Neff et.al, IEEE Trans. on CSVT, vol.7, pp.158–171, February 1997) discloses that a technique “Matching Pursuits” (pattern matching) is used to expand an inter frame prediction error signal in a linear combination of an over-complete basis set. In such a technique, since the larger number of bases (basic signal patterns) is available than the DCT and a unit of basis representation is not limited to a block, it is possible to obtain superior image quality with respect to vision at a low rate of encoding compared to the DCT encoding.
In order to take advantage of the “Matching Pursuits” encoding technique, however, the problem that there is a burden on implementation such as the number of operations necessary for the encoding side to perform the basis search is pointed out. Also, it is necessary to efficiently represent position information because the searched basis may be located at an arbitrary pixel position on an image plane.
On the other hand, there is an approach that an encoding distortion is eliminated by using hierarchical encoding. SNR Scalability mode (ISO/IEC 13818-2) in MGEG-2 and MPEG-4 Fine Granularity Scalability (FGS) mode (ISO/IEC JTC1/SC29/WG11/N3908) follow this approach. Hereinafter, the hierarchical encoding aiming at compensating such an encoding distortion factor is called “quality hierarchical encoding”. The quality hierarchical encoding technique is a technique such that an encoding distortion generated in an encoding picture in a base layer is separately encoded as an enhance layer and a decoding side sums signals obtained by decoding individual layers so as to enhance the quality of a decoded image. Regarding the quality hierarchical encoding technique, the necessary number of transmission bits increases by an amount of encoding data in the enhance layer. However, since it is possible to transmit the semantic content of a picture only in the base layer, the quality hierarchical encoding technique is favorable for a picture transmission required to accommodate flexibly to a network such as the Internet and a wireless network whose transmission condition (bit rate, packet loss probability, error rate and so on) varies over time.
In the MPEG-4 FGS, since the DCT is further performed for an encoding error signal in the enhance layer and the resulting coefficients are transmitted per bit plane, it is possible to transmit a picture in a manner such that the picture quality is gradually improving in the decoding side as its name suggests. However, the enhance layer still depends on the DCT and the DCT block structure, and a distortion component depending upon the block structure, which shows up in an encoding distortion component in the base layer, generates high order DCT coefficients. As a result, if little information is used in the enhance layer, the encoding does not work efficiently.