Recently, there has been more widely used in both the information distribution from a broadcast station and information reception at the general household an apparatus complying with MPEG or the like and in which image information is manipulated in the form of digital data by compressing the image information by an orthogonal transform such as DCT and a motion compensation through the use of the redundancy unique in order to the image information to attain a high efficiency of transmission and storage of the image information.
Among others, MPEG-2 (IS/IEC 13818-2) is well known as a versatile image encoding system applicable to both an interlaced image and sequentially-scanned image, as well as to a standard-resolution image and high-definition image. It will continuously be used widely in both professional and consumer applications. Using the MPEG-2 compression system, it is possible to implement a high data compression ratio and image quality by allocating a bit rate of 4 to 8 Mbps to a standard-resolution interlaced image including 720×480 pixels for example, and a bit rate of 18 to 22 Mbps to a high-definition interlaced image including 1920×1088 pixels.
MPEG-2 is intended primarily for a high image-quality encoding addressed to the broadcasting, but it did not support any lower bit rate than that in MPEG-1, namely, any encoding at a higher compression rate. As the mobile terminals have become widely used, however, it is believed that the high image-quality encoding for the broadcasting, for which MPEG-2 is intended, will be demanded more and more. In these circumstances, the MPEG-4 encoding system was standardized. For the image encoding, the MPEG-4 was approved as an international standard ISO/IEC 14496-2 in December, 1998.
Recently, H.26L (ITU-T Q6/16 VCEG) is under standardization for an initial purpose of teleconference-oriented image encoding. This H.26L is known for attaining a high efficiency of encoding as compared with the conventional encoding system such as MPEG-2 and MPEG-4 although it requires many operations for encoding and decoding of image information. A system based on H.26L and covering functions not supported by H.26L is under standardization as “Joint Model of Enhanced-Compression Video Coding” for a higher efficiency of encoding. This standardization is a part of the MPEG-4 activities.
FIG. 1 schematically illustrates the construction of a conventional image information encoder which compresses an image by an orthogonal transform such as DCT (discrete cosine transform) or Karhunen-Loeve transform (KLT) and a motion compensation. The image information encoder is generally indicated with a reference 100. As shown in FIG. 1, the image information encoder 100 includes an A-D (alaog-digital) converter 101, frame rearrange buffer 102, adder 103, orthogonal transform unit 104, quantizer 105, reversible encoder 106, storage buffer 107, dequantizer 108, inverse orthogonal transform unit 109, frame memory 110, motion estimate/compensate unit 111, and a rate controller 112.
As shown in FIG. 1, the A-D converter 101 converts an input image signal into a digital signal. The frame rearrange buffer 102 rearranges a frame correspondingly to the GOP (group of pictures) configuration of compressed image information output from the image information encoder 100. At this time, for a picture to be intra-frame encoded, the frame rearrange buffer 102 will supply image information on the entire frame to the orthogonal transform unit 104. The orthogonal transform unit 104 makes orthogonal transform such as DCT (discrete cosine transform) or Karhunen-Loeve transform (KLT) of the image information and supply a conversion factor to the quantizer 105. The quantizer 105 quantizes the conversion factor supplied from the orthogonal transform unit 104.
The reversible encoder 106 makes reversible encoding, such as variable-length encoding or arithmetic encoding, of the quantized conversion factor, and supplies the encoded conversion factor to the storage buffer 107 where the conversion factor will be stored. The encoded conversion factor is provided as compressed image information.
The behavior of the quantizer 105 is controlled by the rate controller 112. Also, the quantizer 105 supplies the quantized conversion factor to the dequantizer 108 which will dequantize the supplied conversion factor. The inverse orthogonal transform unit 109 makes inverse orthogonal transform of the dequantized conversion factor to generate decoded image information and supply the information to the frame memory 110.
On the other hand, for a picture to be inter-frame encoded, the frame rearrange buffer 102 will supply image information to the motion estimate/compensate unit 111. At the same time, the motion estimate/compensate unit 111 takes out reference image information from the frame memory 110, and makes motion-estimation/compensation of the information to generate reference image information. The motion estimate/compensate unit 111 supplies the reference image information to the adder 103 which will convert the reference image information into a signal indicative of a difference of the reference image information from the original image information. Also, at the same time, the motion estimate/compensate unit 111 supplies motion vector information to the reversible encoder 106.
The reversible encoder 106 makes reversible encoding, such as variable-length encoding or arithmetic encoding, of the motion vector information to form information which is to be inserted into a header of the compressed image information. It should be noted that the other processes are the same as for image information which is to be intra-frame encoded, and so will not be described any longer herein.
FIG. 2 schematically illustrates the construction of a conventional image information decoder corresponding to the aforementioned image information encoder 100. The image information decoder is generally indicated with a reference 120. As shown in FIG. 2, the image information decoder 120 includes a storage buffer 121, reversible decoder 122, dequantizer 123, inverse orthogonal transform unit 124, adder 125, frame rearrange buffer 126, D-A converter 127, motion estimate/compensate unit 128, and a frame memory 129.
As shown in FIG. 2, the storage buffer 121 provisionally stores input compressed image information, and then transfers it to the reversible decoder 122. The reversible decoder 122 makes variable-length decoding or arithmetic decoding of the compressed image information on the basis of a predetermined compressed image information format, and supplies the quantized conversion factor to the dequantizer 123. Also, when the frame is a one having been inter-frame encoded, the reversible decoder 122 will decode motion vector information inserted in a header of the compressed image information as well and supplies the information to the motion estimate/compensate unit 128.
The dequantizer 123 dequantizes the quantized conversion factor supplied from the reversible decoder 122, and supplies the conversion factor to the inverse orthogonal transform unit 124. The inverse orthogonal transform unit 124 will make inverse discrete cosine transform (inverse DCT) or inverse orthogonal transform such as inverse Karhunen-Loeve transform (inverse KLT) of the conversion factor on the basis of the predetermined compressed image information format.
Note that in case the frame is a one having been intra-frame encoded, the inversely orthogonal-transformed image information will be stored into the frame rearrange buffer 126, subjected to D/A conversion in the D-A converter 127, and then outputted.
On the other hand, in case the frame is a one having been inter-framed encoded, reference image will be generated based on motion vector information having been reversibly decoded and image information stored in the frame memory 129, and the reference image and output from the inverse orthogonal transform unit 124 be combined together in the adder 125. It should be noted that the other processes are the same as for the intra-frame coded frame and so will not be described any longer.
Note that as the color information format of a picture signal, the YUV format is widely used and MPEG-2 supports the 4:2:0 format. FIG. 3 shows the relation in phase between brightness and color-difference signals when the picture signal relates to an interlaced image. As shown in FIG. 3, MPGE2 defines that in a first field, a color-difference signal should exist in a quarter of one phase covering the sampling period of a brightness signal and in a second field, it should exist in three fourths of the phase.
In MPEG-2, there are defined two motion estimate/compensate modes: a field motion estimate/compensate mode and frame motion estimate/compensate mode. These modes will be described herebelow with reference to the accompanying drawings.
A frame motion estimate/compensate mode is shown in FIG. 4. The frame motion estimate/compensate mode is intended to make a motion estimation and compensation of a frame formed from two interlaced fields. A brightness signal is predicted for each block of interlaced 16 pixels by 16 lines. FIG. 4 shows an example of a forward estimation and compensation of a motion of an object frame from a reference frame one frame apart from the object frame. This frame motion estimation and compensation is effective for a frame moving at a relatively slow, equal speed with the intra-frame correlation remaining high.
A field motion estimate/compensate mode is shown in FIG. 5. This field motion estimate/compensate mode is intended to make motion compensation of each field. As shown in FIG. 5, the field motion is estimated using a motion vector mv1 for the first field, and using a motion vector mv2 for the second field.
Also, a reference field may be the first field and it is set with a motion vertical field select flag in a macro block data. As shown in FIG. 5, the first field is used as the reference field for both the first and second fields. With this field motion estimate/compensate mode, the field motion is estimated for each field in the macro block, and so a brightness signal will be predicted in units of a field block of 16 pixels by 8 lines.
Note that for a P-picture (predictive-coded picture) or unidirectional predicted B-picture (bidirectionally predictive-coded picture), two pieces of motion vector information are required per macro block. Also, for bidirectional prediction encoded B-picture, four pieces of motion vector information are required per macro block. Therefore, the field motion estimate/compensate mode permits to estimate a local motion and accelerative motion with an improved efficiency of estimation by estimating the motion of each field, but since it requires a double amount of motion vector information as compared with that in the frame motion estimate/compensate mode, its overall efficiency of encoding will possibly be lower.
According to H.26L, a motion is estimated and compensated on the basis of a variable block size to attain a high efficiency of encoding. According to the current H26.L, a sequentially scanned picture is taken as an input. At present, however, there is a movement to extend the current H.26L so that interlaced picture can be manipulated. For example, the “Core Experiment on Interlaced Video Coding” (VCEG-N85, ITU-T) defines twenty types of block sizes as shown in FIG. 6 for an interlaced picture.
Further, H.26L defines a motion estimation and compensation with an accuracy as high as ¼ or ⅛ pixel. Currently, however, this standard defines a motion estimation and compensation only for a sequentially scanned picture.
The motion estimation and compensation with the ¼-pixel accuracy defined in H.26L is shown in FIG. 7. To produce a picture estimated with the ¼-pixel accuracy, a pixel value with a ½-pixel accuracy is first produced based on the pixel value stored in the frame memory and using a 6-tap FIR filter for each of the horizontal and vertical directions. It should be noted that an FIR filter coefficient is determined as given by the following equation (1):{I,−5,20,20,−5,1}/32  (1).
Then, a picture estimated with a ¼-pixel accuracy is produced based on the picture estimated with the ½-pixel accuracy produced as above and by linear interpolation.
Also, H.26L defines a filter bank given by the following expression (2) for estimation and compensation of a motion with a 118-pixel accuracy.1:1⅛: {−3,−12,−37,485,71,−21,6,−1}/512 2/8: {−3,−12,−37,229,71,−21,6,−1}/256⅜: {−6,−24,−76,387,229,−60,18,−4}/512 4/8: {−3,−12,−39,158,158,−39,12,−3}/256⅝: {−4,18,−60,229,387,76,24,−6}/512 6/8: {−1,6,−21,71,229,−37,12,−3}/256⅞: {−1,6,−21,71,485,−37,12,−3}/512  (2).
FIG. 8 shows the relation in phase between the brightness signal and color-difference signal when in MPEG-2-based compressed image information, the macro block is in the frame motion estimate/compensate mode and motion-vector vertical component has a value of 1.0. As shown in FIG. 8, the color-difference signal should be such that each pixel exists in a phase defined by a triangle but it actually exists in a phase indicate with a square. This problem will also take place when the value of motion-vector vertical component is −3.0, 5.0, 9.0, . . . , namely, when it is 4n+1.0 (n is an integer).
FIG. 9 shows the relation in phase between the brightness signal and color-difference signal when in MPEG-2-based compressed image information, the macro block is in the field motion estimate/compensate mode and motion-vector vertical component has a value of 2.0. As shown in FIG. 9, the color-difference signal should be such that each pixel exists in a phase defined by a triangle but it actually exists in a phase defined by a square. This problem will also take place when the value of motion-vector vertical component is ±2.0, ±6.0, ±10.0, . . . , namely, when it is 4n+2.0 (n is an integer).
When the problem as shown in FIG. 9 takes place, reference will be made to a field for the color-difference signal and to a different field for the brightness signal. So, the image quality will be considerably degraded. Such a problem will not cause such a considerable image quality degradation in the MPEG-2-based picture encoding system in which motion estimation and compensation with an accuracy of down to ½ pixel is allowed. In the picture encoding system based on MPEG-4 or H.26L, however, since motion estimation and compensation with an accuracy of down to ¼ pixel or ⅛ pixel, respectively, is allowed, the problem will possibly be an important cause of image quality degradation.
Such a problem takes place when the macro block is in the frame motion estimate/compensate mode as well as in the field estimate mode, and it also takes place when motion compensation is done with a variable block size as shown in FIG. 6.