With the advancement of multimedia applications in recent years, it has become common to handle information of various media, such as image, audio, and text, in an integrated form. Such integrated handling of media is possible by digitizing all types of media. However, since an enormous amount of data is contained in digitized images, it is essential to apply information compression technology to the images for their storage and transmission. Meanwhile, the standardization of compression technologies is also important for the interoperability of compressed image data. The global standards for image compression include: H.621 and H.263 of ITU-T (International Telecommunication Union-Telecommunication Standardization Sector); MPEG (Moving Picture Experts Group)-1, MPEG-2, MPEG-4, and the like of ISO/IEC (International Organization for Standardization/International Electrotechnical Commission); and H.264 (MPEG-4AVC), which is under standardization by JVT (Joint Video Team), which is a joint effort between ITU-T and MPEG.
In general, in coding of a moving picture, the amount of information is compressed by reducing redundancies in temporal and spatial directions. Therefore, in inter-picture predictive coding aiming at reducing temporal redundancies, motion estimation and generation of a predictive image are carried out on a block-by-block basis with reference to forward or backward picture(s), and coding is then performed on the difference value between the obtained predictive image and an image in the current picture to be coded. Here, “picture” is a term denoting one image. In the case of a progressive image, a picture means a frame, whereas it means a frame or fields in the case of an interlaced image. Here, “interlaced image” is an image of a frame composed of two fields which are separated in capture time. In coding and decoding of an interlaced image, it is possible to handle one frame as a frame as it is, as two fields, or in a frame structure or a field structure on a per-block basis within the frame.
A picture to be coded using intra picture prediction without reference to any reference images shall be referred to as an I picture. A picture to be coded using inter-picture prediction with reference to only one reference image shall be referred to as a P picture. A picture to be coded using inter-picture prediction with reference to two reference images at the same time shall be referred to as a B picture. It is possible for a B picture to refer to two pictures which can be arbitrarily combined from forward/backward pictures in display time. Reference images (reference pictures) can be designated for each macroblock serving as a basic unit of coding. Distinction shall be made between such reference pictures by calling a reference picture to be described earlier in a coded bitstream a first reference picture, and by calling a reference picture to be described later in the bitstream a second reference picture. Note that as a condition for coding these types of pictures, pictures used for reference need to be already coded.
P pictures and B pictures are coded using motion-compensated inter-picture predictive coding. Motion-compensated inter-picture predictive coding is a coding scheme that employs motion compensation in inter-picture predictive coding. Unlike a technique to perform prediction simply based on pixel values in a reference picture, motion compensation is a technique capable of improving prediction accuracy as well as reducing the amount of data by estimating the amount of motion (hereinafter referred to as “motion vector”) of each part within a picture and further by performing prediction in consideration of such amount of motion. For example, it is possible to reduce the amount of data by estimating motion vectors of the current picture to be coded and then by coding prediction residuals between the current picture to be coded and prediction values obtained by making a shift by the amount equivalent to the respective motion vectors. In this scheme, motion vectors are also recorded or transmitted in coded form, since motion vector information is required at the time of decoding.
A motion vector is estimated on a macroblock basis. More specifically, a macroblock in the current picture to be coded shall be previously fixed, and a motion vector is estimated by shifting a macroblock within the search area in a reference picture so as to find the position of a reference block which is most similar to such fixed block in the picture to be coded.
FIG. 1 is a block diagram showing the structure of a conventional inter-picture predictive coding apparatus.
This inter-picture predictive coding apparatus is comprised of a motion estimating unit 401, a multi frame memory 402, a subtracting unit 404, a motion compensating unit 405, a coding unit 406, an adding unit 407, a motion vector memory 408, and a motion vector predicting unit 409.
The motion estimating unit 401 compares each of motion estimation reference pixels MEpel outputted from the multi frame memory 402 with an image signal Vin, and outputs a motion vector MV and a reference picture number RefNo. The reference picture number RefNo is an identification signal that identifies a reference image, selected from among plural reference images, to be referred to by the current image. The motion vector MV is outputted to the motion vector predicting unit 409 as a neighboring motion vector PrevMV, after temporarily stored in the motion vector memory 408. The motion vector predicting unit 409 predicts a predictive motion vector PredMV with reference to such inputted neighboring motion vector PrevMV. The subtracting unit 404 subtracts the predictive motion vector PredMV from the motion vector MV, and outputs the resulting difference as a motion vector prediction difference DifMV.
Meanwhile, the multi frame memory 402 outputs, as motion compensation reference pixels MCpel1, pixels indicated by the reference picture number RefNo and the motion vector MV. The motion compensating unit 405 generates reference pixels with fractional-pixel accuracy, and outputs them as reference image pixels MCpel2. The subtracting unit 403 subtracts the reference image pixels MCpel2 from the image signal Vin, and outputs the resultant as an image prediction error DifPel.
The coding unit 406 performs variable length coding on each image prediction error DifPel, motion vector prediction difference DifMV, reference picture number RefNo, and outputs a coded signal Str. Note that a decoded image prediction error RecDifPel, which is the result of decoding the image prediction error, is outputted at the same time at the time of coding. The decoded image prediction error RecDifPel is obtained by superimposing a coded error onto the image prediction error DifPel, and matches an inter-picture prediction error obtained by an inter-picture predictive decoding apparatus decoding the coded signal Str.
The adding unit 407 adds the decoded image prediction error RecDifPel to the reference image pixels MCpel2, and the resultant is stored into the multi frame memory 402 as a decoded image RecPel. However, in order to make an effective use of the capacity of the multi frame memory 402, an area for an image stored in the multi frame memory 402 is released when such area is not necessary, and a decoded image RecPel of an image that is not necessary to be stored in the multi frame memory 402 is not stored into the multi frame memory 402. Note that coding is performed in units called macroblocks, each containing 16×16 pixels. In motion compensation according to H.264, an appropriate block size is selected for use with coding on a macroblock basis from among seven block sizes intended for motion compensation: 4×4, 4×8, 8×4, 8×8, 8×16, 16×8, and 16×16.
FIG. 2 is a block diagram showing the structure of a conventional inter-picture predictive decoding apparatus. In this diagram, the same elements as those of the inter-picture predictive coding apparatus shown in FIG. 1 are assigned the same reference numbers, and the descriptions thereof are not given here.
The conventional inter-picture predictive decoding apparatus shown in FIG. 2, which is an apparatus that decodes the coded signal Str coded by the conventional inter-picture predictive coding apparatus shown in FIG. 1, and outputs a decoded image signal Vout, is comprised of a multi frame memory 402, a motion compensating unit 405, an adding unit 407, an adding unit 501, a motion vector memory 408, a motion vector predicting unit 409, and a decoding unit 502.
The decoding unit 502 decodes the coded signal Str, so as to output each decoded image prediction error RecDifPel, motion vector prediction difference DifMV, and reference picture number RefNo. The adding unit 501 adds a predictive motion vector PredMV to a motion vector prediction error DifMV, so as to decode a motion vector MV.
The multi frame memory 402 outputs, as motion compensation reference pixels MCpel1, pixels indicated by each reference picture number RefNo and motion vector MV. The motion compensating unit 405 generates reference pixels with fractional-pixel accuracy, and outputs them as reference image pixels MCpel2. The adding unit 407 adds the decoded image prediction error RecDifPel to the reference image pixels MCpel2, and stores the resultant into the multi frame memory 402 as a decoded image RecPel. However, in order to make an effective use of the capacity of the multi frame memory 402, an area for an image stored in the multi frame memory 402 is released when such area is not necessary, and a decoded image RecPel of an image that is not necessary to be stored in the multi frame memory 402 is not stored into the multi frame memory 402. The decoded image signal Vout, that is, decoded images RecPel, is correctly decoded from the coded signal Str.
Meanwhile, the H.264 standard allows motion compensation with up to quarter-pixel accuracy (MPEG-4 Simple Profile allows motion compensation with up to half-pixel accuracy). In this case, the H.264 standard employs 6-tap filtering as a method of linear filter pixel interpolation, specifying that a half pixel is determined from its six neighboring pixels. Referring to FIG. 3, the pixel interpolation method using 6-tap filtering is described.
FIG. 3 is a schematic diagram for illustrating a method according to H.264 of performing pixel interpolation between luminous components. Each of the pixels F00, F01, F02, F03, F04, F05, F10, F11, F12, F13, F14, F15, F20, F21, F22, F23, F24, F25, F30, F31, F32, F33, F34, F35, F40, F41, F42, F43, F44, F45, F50, F51, F52, F53, F54, and F55, is a pixel at an integer-pixel position, and is represented by a diagonally shaded square. Here, each of A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S, T, and U, indicates a position and its pixel value. Each of the pixels at fractional-pixel positions is represented by a hollow square. Each of the pixels aa, bb, cc, dd, ee, ff, gg, and hh, is an intermediate calculation pixel value, before a bit shift is performed, which is multiplied with a coefficient of a 6-tap filter as well as the position of such value. Each of a, b, c, d, e, f, g, h, i, j, k, m, n, p, q, r, and s, represents a pixel value and its position resulted by performing 6-tap filtering and performing linear interpolation at each of the fractional-pixel positions. The following describes a method of calculating the pixel value at each of the fractional-pixel positions a, b, c, d, e, f, g, h, i, j, k, n, p, q, and r. Here, each of b1, h1, s1, and m1 indicates an intermediate calculation pixel value, before a bit shift is performed, which is multiplied with a 6-tap filter coefficient for determining each of the pixel values at b, h, s, and m.
In the case of calculating the pixel value of a half pixel b, 6-tap filtering represented by Equation 1 is performed using six neighboring integer pixels E, F, G, H, I, and J in the horizontal direction, so as to determine an intermediate calculation pixel value b1. Then, a bit shift represented by Equation 2 is performed to carry out a division with half-adjust with integer accuracy as well as a round-off for making the pixel level within a valid range, so as to determine the pixel value at the position b reached by making a half-pixel shift in the horizontal direction.b1=(E−5×F+20×G+20×H−5×I+J)  (1)b=Clip(b1+16)>>5)  (2)
Here, Clip function, which denotes a round-off, is a function for limiting the value of an output result to fall within the range of values between 0 to 255, by correcting a value less than 0 to 0 and correcting a value greater than 255 to 255. The following description simply refers to the process of a Clip function as a round-off.
In the case of calculating the pixel value of a half pixel h, 6-tap filtering represented by Equation 3 is performed in the similar manner using six neighboring integer pixels A, C, G, M, R, and T in the vertical direction, so as to determine an intermediate calculation pixel value h1. Then, a bit shift and a round-off represented by Equation 4 is performed so as to determine the pixel value at the half-pixel position h reached by making a shift in the vertical direction.h1=(A−5×C+20×G+20×M−5×R+T)  (3)h=Clip((h1+16)>>5)  (4)
In the case of calculating the pixel value of a half pixel j, an intermediate calculation pixel value j1 is determined either by performing tap filtering represented by Equation 5 using six intermediate calculation pixel values s1, aa, bb, gg, and hh calculated in the same manner as the one in which the intermediate calculation pixel value b1 is calculated or by performing 6-tap filtering represented by Equation 6 using six pixels, m1, cc, dd, ee, and ff calculated in the same manner as the one in which the intermediate calculation pixel value h1 is calculated. Then, a bit shift and a round-off represented by Equation 7 is performed so as to determine the pixel value j at the half-pixel position reached by making a shift in each of the horizontal and vertical directions. Here, in order to minimize a round-off error in the value of the pixel value j, intermediate calculation pixel values m1, s1, aa, bb, cc, dd, ee, ff, gg, and hh before a bit shift is performed, are used for the intermediate calculation for determining the intermediate calculation pixel value j1.j1=cc−5×dd+20×h1+20×m1−5×ee+ff  (5)j1=aa−5×bb+20×b1+20×s1−5×gg+hh  (6)j=Clip((j1+512)>>10)  (7)
The pixel values of the respective half-pixels s and m are determined by performing a bit shift and a round-off represented by Equation 8 and Equation 9, respectively, as in the case of the half pixels b and h.s=Clip((s1+16)>>5)  (8)m=Clip((m1+16)>>5)  (9)
Finally, the value of each of the quarter-pixels a, c, d, n, f, i, k, q, e, g, p, and r, is calculated by calculating a pixel value obtained by performing a half-adjust of the value to the decimal place, using the integer pixels G, H, M, and N as well as Equation 2, Equation 4, Equation 7, Equation 8, and Equation 9 (Equation 10 to Equation 21).a=(G+b+1)>>1  (10)c=(H+b+1)>>1  (11)d=(G+h+1)>1  (12)n=(M+h+1)>>1  (13)f=(b+j+1)>>1  (14)i=(h+j+1)>>1  (15)k=(j+m+1)>>1  (16)q=(j+s+1)>>1  (17)e=(b+h+1)>>1  (18)g=(b+m+1)>>1  (19)p=(h+s+1)>>1  (20)r=(m+s+1)>>1  (21)
FIG. 4 is a circuit diagram showing the structure of the motion compensating unit 405 that generates motion-compensated pixels in the above-described manner in the case where it is constructed using the conventional technology. The motion compensating unit 405 is the same as the one described for the inter-picture predictive coding apparatus shown in FIG. 1 and the inter-picture predictive decoding apparatus shown in FIG. 2. Motion compensation reference pixels MCpel1 are inputted to the motion compensating unit 405 from the multi frame memory 402. The multi frame memory 402 is also the same as the one described for the inter-picture predictive coding apparatus shown in FIG. 1 and the inter-picture predictive decoding apparatus shown in FIG. 2, and therefore the description thereof is not given here.
The motion compensating unit 405 includes: a delay circuit 501; a high-order tap filtering unit 502; a selector/adder 517 that performs signal selection and addition; and a bit shift 518.
The delay circuit 501 obtains motion compensation reference pixels MCpel from the multi frame memory 402, and holds and outputs the respective pieces of image data while delaying the timings. The high-order tap filtering unit 502 obtains the respective pieces of pixel data outputted from the delay circuit 501, performs 6-tap filtering, a bit shift, and a round-off on each of the obtained pixel data, and outputs the resultant. The selector/adder 517 selects pixel values from those inputted from the delay circuit 501 and high-order tap filtering unit 502 according to the position of pixels to be motion compensated, performs an addition where necessary, and outputs the resultant. The bit shift 518 performs a bit shift of the output result from the selector/adder 517 where necessary, according to the position of pixels to be motion compensated, whereas the value is outputted as it is as a reference image pixel MCpel2 when bit shift is not necessary.
The delay circuit 501 obtains six horizontal pixels at the same time and performs a 6-stage delay. Pixel data at each of the integer-pixel positions A, B, C, D, E, F, G, H, I, J, K, L, M, N, P, Q, R, S, T, and U in the schematic diagram of FIG. 3 showing the pixel interpolation method, is accumulated in each of the buffers that make up the delay circuit 501, that is, in a buffer BA, a buffer BB, a buffer BC, a buffer BD, a buffer BE, a buffer BF, a buffer BG, a buffer BH, a buffer BI, a buffer BJ, a buffer BK, a buffer BL, a buffer BM, a buffer BN, a buffer BP, a buffer BQ, a buffer BR, a buffer BS, a buffer BT, and a buffer BU.
The high-order tap filtering unit 502 includes: plural 6-tap filters 503 to 511; and plural bit shifts 512 to 516, each performing a half-adjust of the value after the decimal point as well as a round-off. Each of the 6-tap filter 503, 6-tap filter 504, 6-tap filter 505, 6-tap filter 506, 6-tap filter 507, 6-tap filter 508, 6-tap filter 509, and 6-tap filter 510, receives an output signal from the delay circuit 501, performs a multiplication of a coefficient and an addition, and outputs each of the intermediate calculation pixel values Saa, Sbb, Sb1, Ss1, Sgg, Shh, Sm1, and Sh1. At this time, the intermediate calculation pixel values Sb1 and Sh1 have values obtained by Equation 1 and Equation 3, respectively. The values of the other intermediate calculation pixel values Saa, Sbb, Ss1, Sgg, Shh, and Sm1 are output values from a 6-tap filter which are the same as the ones represented by Equation 1 and Equation 3, and correspond to aa, bb, s1, gg, hh, and m1 in the schematic diagram of FIG. 3 showing the pixel interpolation method.
The 6-tap filter 511 has, as its inputs, the output values Saa, Sbb, Sb1, Ss1, Sgg, Shh, Sm1, and Sh1, which are the results of the 6-tap filters in the horizontal direction, and outputs Sj1 as the result of the 6-tap filter in the vertical direction represented by Equation 6. The bit shift 512, bit shift 513, bit shift 514, bit shift 515, and bit shift 516 have, as their respective inputs, Ss1, Sj1, Sb1, Sm1, and Sh1, perform a bit shift and a round-off, represented by Equation 8, Equation 7, Equation 2, Equation 9, and Equation 4, for performing a half-adjust of the value after the decimal point, and output the respective resultants as Ss, Sj, Sb, Sm, and Sh.
The selector/adder 517 and the bit shift 518 have, as their respective inputs, SN, SM, SH, and SG, which are buffer values from the delay circuit 501, as well as Sb, Ss, Sm, Sh, and Sj, which are output values from the high-order tap filtering unit 502. The selector/adder 517 and the bit shift 518 perform average value calculations with half-adjust represented by Equation 10 to Equation 21 where necessary, according to the position of pixels to be motion compensated, and output motion-compensated pixels MCpel2 with fractional accuracy.
Through the above structure and a series of operations, it is possible to generate a motion-compensated image with quarter-pixel accuracy from motion compensation reference pixels MCpel1 so as to output it as a motion-compensated image MCpel2 with fractional accuracy, and obtains a decoded image signal Vout, that is, decoded images RecPel, correctly from the coded signal Str so as to output it (for example, refer to Non-patent document 1). Non-patent document 1: “Draft ITU-T Recommendation and Final Standard of Join Video Specification” 8.4.2.2, Join Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, JVT-G050r1, 27, May 2003