(1) Field of the Invention
The present invention relates to a motion compensation apparatus which performs inter-picture motion compensation prediction.
(2) Description of the Related Art
In recent years, unified handling of various media information such as image, sound, text, and so on, has become common with the development of multi-media applications. At this time, unified handling of media is made possible through the digitalization of all media. However, as digitalized images carry a massive amount of data, image information compression technology is indispensable for storage and transmission.
At the same time, standardization of compression technology is also important for the interoperation of compressed image data. Examples of standard specifications for image compression technology are the following: H.261 and H.263 of the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T); Moving Picture Experts Group (MPEG)-1, MPEG-2, MPEG-4, and so on, of the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC); and H.264 (MPEG-4AVC), which standardization is being promoted by the Joint Video Team (JVT) which is an ITU-T and MPEG tie-up.
In general, compression of information volume is carried out in the coding of moving pictures by reducing redundancy in the temporal direction and the spatial direction. Consequently, in inter-picture prediction coding which has the reduction of temporal redundancy as an objective, a picture in the forward or backward direction is referred to, motion estimation and predictive picture creation are carried out on a per block basis, and coding is performed on the difference between the obtained predictive picture and the picture to be coded. Here, “picture” is a term used to indicate a single image plane, and refers to a frame in the case of progressive images, and refers to a frame or a field in the case of interlaced images. Here, interlaced images refer to images in which one frame is made up of two temporally different fields. In the coding and decoding of interlaced images it is possible to process a single frame as a frame or as two fields, or process each block within the frame as a frame structure or field structure.
A picture on which intra-picture prediction coding is performed without a reference picture is called an I-picture. Furthermore, a picture on which inter-picture prediction coding is performed with reference to only one reference picture is called a P-picture. Furthermore, a picture on which inter-picture prediction coding can be performed with simultaneous reference to two reference pictures is called a B-picture. A B-picture can refer to a random combination of two pictures in the forward direction or backward direction. A reference image (reference picture) can be specified on a per macroblock basis, which is the basic unit for coding, and differentiated into a first reference picture which is the reference picture described ahead within a coded bit stream, and a second reference picture which is the reference picture described later. However, a condition in the coding of these pictures is that the pictures to be referred to must already be coded.
Motion compensation inter-picture prediction coding is used in the coding of a P-picture or a B-picture. Motion compensation inter-picture prediction coding is a coding method which applies motion compensation in inter-picture prediction coding. Motion compensation is a method which increases prediction precision and reduces data volume by estimating the amount of motion (hereinafter, referred to as “motion vector”) for each part within a picture and performing prediction with consideration given to such amount of motion, and not simply predicting from the pixel value of the reference frame. For example, the motion vector of the picture to be coded is estimated and, by coding the predictive residual between the picture to be coded and the predictive value shifted by the amount of the motion vector, data volume is reduced. In this method, motion vectors are also coded and recorded, or transmitted, as the information of the motion vectors is required during decoding.
The motion vector is estimated on a per macroblock basis. Specifically, the motion vector is estimated by keeping the macroblock of the picture to be coded fixed, moving the macroblock of the reference picture within the search range, and finding the position of the reference block which is most similar to the base block.
FIG. 1 is a block diagram showing the structure of a conventional inter-picture prediction coding apparatus.
This inter-picture prediction coding apparatus includes a motion estimation unit 401, a multi-frame memory 402, a subtraction unit 403, a subtraction unit 404, a motion compensation unit 405, a coding unit 406, an addition unit 407, a motion vector memory 408, and a motion vector prediction unit 409.
The motion estimation unit 401 compares motion estimation reference pixels “MEpel” which is outputted by the multi-frame memory 402 and a picture signal “Vin”, and outputs a motion vector “MV” and a reference picture number “RefNo”. The reference picture number RefNo is an identification signal that identifies the reference picture, selected from among a plurality of reference pictures, to be referred to by the current picture to be coded. The motion vector MV is temporarily stored in the motion vector memory 408, after which it is outputted to the motion vector prediction unit 409, as an adjacent motion vector “PrevMV”. The motion vector prediction unit 409 predicts a predictive motion vector “PredMV” by referring to the received adjacent motion vector PrevMV. The subtraction unit 404 subtracts the predictive motion vector PredMV from the motion vector MV, and outputs the difference as a motion vector prediction difference “DifMV”.
At the same time, the multi-frame memory 402 outputs the pixels indicated by the reference picture number RefNo and the motion vector MV, as motion compensation reference pixels MCpel1. The motion compensation unit 405 generates and outputs sub-pixel precision reference pixels as reference image pixels “MCpel2”. The subtraction unit 403 subtracts reference image pixels MCpel2 from the picture signal Vin, and outputs a prediction error “DifPel”.
The coding unit 406 performs variable-length coding on the prediction error DifPel, the motion vector prediction difference DifMV, and the reference picture number RefNo, and outputs a coded stream “Str”. In addition, a decoded prediction error “RecDifPel”, which is the decoded result of the prediction error, is also outputted simultaneously at the time of coding. The decoded prediction error RecDifPel is the prediction error DifPel superimposed with the coding error, and it matches the inter-picture prediction error obtained through the decoding of the coded stream Str by the inter-picture prediction decoding apparatus.
The addition unit 407 adds the decoded prediction error RecDifPel to the reference image pixels MCpel 2, and stores this in the multi-frame memory 402 as a decoded picture “RecPel”. However, in order to effectively use the capacity of the multi-frame memory 402, the region for a picture stored in the multi-frame memory 402 is freed when not required. Furthermore, coded picture RecPel that does not need to be stored in the multi-frame memory 402 is not stored in the multi-frame memory 402.
Moreover, coding is performed in units referred to as a 16×16 pixel macroblock. In the H.264 specification, the appropriate block for motion compensation is selected, on a per macroblock basis, from among seven motion compensation block (hereinafter, also referred to simply as sub-block) sizes, namely 4×4, 4×8, 8×4, 8×8, 8×16, 16×8, and 16×16, and used for coding. Here, a macroblock can be partitioned in two stages, first, partitioning in a macroblock partition (four in the 8×8 size, two in the 8×16 size, two in the 16×8 size, or the 16×16 size) size and, with respect to the 8×8 macroblock partition, a sub-macroblock partition (four in the 4×4 size, two in the 4×8 size, two in the 8×4 size, and the 8×8 size) size.
FIG. 2 is a block diagram showing the structure of a conventional inter-picture prediction decoding apparatus. Parts that are the same as those in the inter-picture prediction coding apparatus shown in FIG. 1 are assigned the same symbols and their descriptions omitted.
The conventional inter-picture prediction decoding apparatus shown in FIG. 2 is an apparatus which decodes the coded stream Str coded by the inter-picture prediction coding apparatus shown in FIG. 1, and outputs a decoded picture signal “Vout”. It includes a multi-frame memory 402, a motion compensation unit 405, an addition unit 407, an addition unit 501, a motion vector memory 408, a motion vector prediction unit 409, and a decoding unit 502.
The decoding unit 502 decodes the coded stream Str, and outputs the decoded prediction error RecDifPel, the motion vector prediction difference DifMV, and the reference picture number RefNo. The addition unit 501 adds the predictive motion vector PredMV outputted by the motion vector prediction unit 409 and the motion vector prediction difference DifMV, and decodes the motion vector MV.
The multi-frame memory 402 outputs the pixels indicated by the reference picture number RefNo and the motion vector MV, as the motion compensation reference pixels MCpel1. The motion compensation unit 405 generates and outputs sub-pixel precision reference pixels as the reference image pixels MCpel2. The addition unit 407 adds the decoded prediction error RecDifPel to the reference image pixels MCpel2, and stores this in the multi-frame memory 402 as a decoded picture RecPel. However, in order to effectively use the capacity of the multi-frame memory 402, the region of a picture stored in the multi-frame memory 402 is freed when not required. Furthermore, coded picture RecPel that does not need to be stored in the multi-frame memory 402 is not stored in the multi-frame memory 402. In the manner described above, the decoded picture signal Vout, in other words the decoded picture RecPel can be properly decoded from the coded stream Str.
Incidentally, the H.264 specification permits the performance of motion compensation up to quarter-pixel units (up to half-pixel units in MPEG-4 Simple Profile). At this time, a 6-tap filter is applied as the method for linear filtering pixel interpolation, and the obtainment of a half-pixel precision pixel from surrounding 6 pixels is required in the H.264 specification. The pixel interpolation using the 6-tap filter shall be explained using FIG. 3.
FIG. 3 is a schematic diagram for describing the method for the interpolation of luminance component pixels in the H.264 specification.
Pixels F00, F01, F02, F03, F04, F05, F10, F11, F12, F13, F14, F15, F20, F21, F22, F23, F24, F25, F30, F31, F32, F33, F34, F35, F40, F41, F42, F43, F44, F45, F50, F51, F52, F53, F54, and F55 are pixels with an integer precision pixel location, and are shown as squares filled with slanted lines. Here, pixels A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, and U indicate the respective locations and pixel values.
Furthermore, pixels with a sub-pixel precision pixel location are shown as white squares. Pixels aa, bb, b, s, gg, and hh indicate intermediate calculated pixel values resulting from a 6 tap filtering in the horizontal direction and their locations. Pixels cc, dd, h, m, ee, and ff indicate intermediate calculated pixel values resulting from a 6 tap filtering in the vertical direction and their locations.
Pixels a, c, d, e, f, g, i, j, k, n, p, q, and r represent pixel values and locations resulting from the performance of a second 6tap filtering and linear interpolation in the respective sub-pixel precision pixel locations.
Accordingly, in order to obtain the value of the sub-pixel precision pixel locations surrounded by the pixels G, H, M, and N which are integer precision pixels, an area of 6×6 pixels is required.
Furthermore, in the block unit in which motion compensation is performed, as shown in FIG. 4, as 6 tap filtering is used for the luminance component in the H.264 specification with respect to area 901 on which the pixels of the block for motion compensation are located, the pixels in area 902 which is wider than the current block by 2 pixels above, 3 pixels below, 2 pixels to the right, and 3 pixels to the left, in other words, 5 pixels respectively for both the horizontal and vertical directions, are required. Accordingly, when sub-pixel precision motion compensation is carried out in each of the 4×4, 4×8, 8×4, 8×8, 8×16, 16×8, and 16×16 block sizes, a 9×9, 9×13, 13×9, 13×13, 13×21, and 21×21 pixel area, respectively, is required.
On the other hand, chrominance component is generated by linear interpolation from 4 integer precision pixels surrounding the sub-pixel precision pixel. The motion compensation block sizes in the case of the chrominance component are 2×2, 2×4, 4×2, 4×4, 4×8, 8×4, and 8×8, and their reference pixel areas are 3×3, 3×5, 5×3, 5×4, 5×9, 9×5, and 9×9, respectively (see “Draft ITU-T Recommendation and Final Standard of Joint Video Specification”, Joint Video Team (JVT) of ISO/IEC MPEG & ITU-T VCEG, JVT-1050, September 2003, pp. 122-125, for example).
Incidentally, in decoding in the aforementioned manner, the decoded picture RecPel needs to be temporarily held in the multi-frame memory 402 up to the time of display as the reference pixels need to be read from the multi-frame memory 402, and in addition, there are instances where the picture sequence of inputted coded data and the reconstructed frame sequence are different. As a result of accessing such as the reading-out of pixels, storage of coded picture RecPel, and the display thereof, the percentage occupied by such accesses in the bus leading to the multi-frame memory 402 is, in general, extremely high.
As such, even when the multi-frame memory is utilized as a memory used for another function such an On Screen Display (OSD), for example, in order to reduce the memory capacity required for decoding, difficulties in allowing dual-purpose usage presents itself as a problem due to the high percentage of the bus being occupied.
With respect to such issue, a technique for reducing the number of accesses to the multi-frame memory 402 is proposed in the conventional picture decoding method. For example, Japanese Laid-Open Patent Application No. 10-215457 Publication describes the reduction of the number of accesses to the multi-frame memory 402 by determining the common area between the reference pixel area required by the current block to be performed of motion compensation and the reference pixel area required by an immediately preceding block, and updating only the pixels outside the determined common area.
However, in the H.264 specification and the like, having an increasing number of motion compensation prediction methods for improving compression efficiency, there are many cases in which almost no common area exists as motion compensation can be performed in extremely small areas, such as 4×4, 4×8 and 8×4 block units for the luminance component, and 2×2, 2×4, 4×2 block units, and so on, for the chrominance component.
FIGS. 5A and 5B are schematic diagrams showing an example of the pixel areas referenced by the current block to be decoded and the block to be decoded immediately before. FIG. 5A shows the appearance of an 8×8 block and FIG. 5B shows the appearance of a 2×2 block.
For example, in the 8×8 block size shown in FIG. 5A, area 913 represents the overlapping region of area 911 referred to by the current block to be decoded and area 912 referred to by the block to be decoded immediately before. This area 913 becomes a non-update area within the local reference memory of the motion compensation unit 405. Here, the difference of the absolute coordinates of area 911 and area 912 in terms of the reference picture is (4, 3). When the number of filter taps is 2 taps according to a bi-linear-type linear interpolation, and the like, the number of pixels in the non-update area is 30 (=5×6) as the overlap in the horizontal direction is 5 pixels and the overlap in the vertical direction is 6 pixels.
On the other hand, in FIG. 5B, the difference of the absolute coordinates, in terms of the reference picture, of area 914 referred to by the current block to be decoded in the 2×2 block size and area 915 referred to by the block to be decoded immediately before is (4, 3) as in the case in FIG. 5A. When the number of filter taps is 2, the situation arises in which there are no overlapping pixels for the non-update area.
Furthermore, depending on the width of the access bus of the multi-frame memory 402, there is a possibility that transmission of only the non-common area is not possible, with the common areas also being eventually transmitted, and the number of accesses cannot be reduced. FIG. 6 is an example showing a part of a decoded picture 921. As 4 pixels in the horizontal direction make up one access unit, in other words, as 1 pixel is made up of 1 byte, it is a diagram showing the case of a memory having a 4-byte bus width.
For example, as shown in FIG. 6, it is assumed that area 923, filled in with slanted lines, referred to by the block coded immediately before the current block to be decoded in the 2×2 block size, is located starting from the head of a 4-byte boundary, and the difference of the absolute coordinates between area 922, referred to by the current block to be decoded and enclosed in heavy lines is (1, 0). Out of 9 pixels, 6 pixels are in a common area, and normally only the 3 pixels of the dotted area 924 need to be updated. However, as a memory structure having a 4-byte bus width is being considered, a 4-byte by 3-line memory transmission which includes area 923 is required for the transmission of area 922, which in effect, means that all the pixels are transmitted.
In addition, in the 8×8, 8×16, 16×8 and 16×16 block sizes for luminance and the 4×4, 4×8, 8×4 and 8×8 block sizes for chrominance (hereinafter referred to as “macroblock partition type”, for short), the non-existence of a common area is anticipated from the start, even for adjacent sub-blocks inside the same macroblock for example, as motion compensation using a different reference picture is possible. However, in the 4×4, 4×8, 8×4, and 8×8 block sizes for luminance and the 2×2, 2×4, 4×2 and 4×4 block sizes for chrominance (hereinafter referred to as “sub-macroblock partition type”, for short), the same reference picture is used for sub-blocks within the same macroblock partition.
In other words, the number of reference pictures and the types of block shapes that can be selected in motion compensation is being increased in order to improve compression efficiency. Furthermore, in the H.264 specification, and the like, which uses high-level tap filter interpolation, there is a high possibility that the number of accesses to the multi-frame memory 402 cannot be reduced by limiting the update area of the reference memory using the determination of a common area in the conventional technology.