1. Field of the Invention
The present invention relates to a motion picture encoding device and a motion picture decoding device, which have an inter-field prediction mode.
2. Description of the Related Art
Generally, motion picture data is large in size. Therefore, when motion picture data is transmitted from a transmitting device to a receiving device or when it is stored in a storage device, highly efficient encoding is applied to motion picture data. In this case, “highly efficient encoding” is an encoding process of converting a specific data string into another data string, and compressing the amount of data.
There are two types of motion picture data: one is mainly composed of only frames and the other is composed of fields. A prior art for compressing a field image is mainly described below.
As the highly efficient encoding method of motion picture data, a frame/field prediction encoding is known.
FIG. 1 shows a block diagram of the configuration of the frame/field predictive encoding device.
This encoding method utilizes the fact that a plurality of segments of motion picture data has high correlation in a time direction with each other. The operation shown in FIG. 1 is roughly described below. A subtractor 39 generates a differential image between an inputted original image and a predicted image, and an orthogonal transform unit 31, a quantization unit 32 and a coefficient entropy encoding unit 40 encode the differential image. An inverse quantization unit 33 and an inverse orthogonal transform unit 34 reproduce the differential image from the output of the quantization unit 32. Then, a decoded image generation unit 35 decodes the encoded image using the reproduced differential image reproduced by the decoded image generation unit 35 and the predicted image used at the time of encoding. A decoded image storage unit 36 stores the reproduced image. Then, motion vector calculation unit 37 calculates a motion vector between the reproduced image and a subsequent input image, and a predicted image generation unit 38 generates a predicted image using the motion vector. The generated motion vector is encoded by a vector entropy encoding unit 41 and is outputted through a MUX 42 together with the encoded coefficient data encoded by the coefficient entropy encoding unit 40. In other words, since in motion picture data, there is generally high similarity between frame/field data at a specific time and frame/field data at a subsequent time, the inter-frame/field predictive encoding method utilizes such a property. For example, in a data transmission system adopting the inter-frame/field predictive encoding method, a transmitting device generates motion vector data indicating displacement from previous frame/field image to a target frame/field image, and differential data between a predicted image in the target frame/field which is generated from the previous frame/field image using its motion vector data and a real image in the target frame/field, and transmits the motion vector data and the differential data to a receiving device. The receiving device reproduces the image in the target frame/field from the received motion vector data and differential data.
So far, the summary of the frame/field predictive encoding has been described with reference to FIG. 1. Next, frame predictive encoding and field predictive encoding are described below.
FIGS. 2 and 3 show a format used to encode a field image that is commonly used in ISO/IEC MPEG-2/MPEG-4 (hereinafter called “MPEG-2” and “MPEG-4”, respectively) and the final committee draft of ITU-T H.264/ISO/IEC MPEG-4 Part 10 (Advanced video coding (AVC)) (“Joint Final Committee Draft (JFCD) of Joint Video Specification (ITU-T REC, H.264|ISO/IEC 14496-10 AVC)”, JVT-D157, or ISO/IEC JTC1/SO29/WG11 MPEG02/N492, July 2002, Klagenfurt, AT) (hereinafter called “AVC FCD”), which ITU-T and ISO/IEC jointly were standardizing as of August 2002. Specifically, each frame is composed of two fields: a top field and a bottom field. FIG. 2 shows the respective positions of a luminance pixels and a chrominance pixels, and a field to which each pixel belongs. As shown in FIG. 2, odd number-ordered luminance lines, such as a first luminance line (50a), a third luminance line (50b), a fifth luminance line (50c), a seventh luminance line (50d), etc., belong to the top field, and even number-ordered lines, such as a second luminance line (51a), a fourth luminance line (51b), a sixth luminance line (51c), a eighth luminance line (51d), etc., belong to the bottom field. Similarly, odd number-ordered chrominance lines, such as a first chrominance line (52a), a third chrominance line (52b), etc., belong to the top field, and even number-ordered chrominance line, such as a second chrominance (53a), a fourth chrominance line, etc., belong to the bottom field.
Each of the top and bottom fields indicates an image at a different time. Next, the time/spatial disposition of the top and bottom fields is described with reference to FIG. 3.
In FIG. 3 and after, the technology of the present invention relates to the vertical component of a motion vector. Therefore, in this specification, horizontal pixel components are not shown, and all the horizontal components of the motion vector are assumed to be 0 for convenience sake. However, in order to show conventional problems and the effects of the present invention, the positional relation between luminance and chrominance in each field is accurately shown.
In FIG. 3, the vertical and horizontal axes represent the pixel position of a vertical component in each field and the elapse of time, respectively. Since there is no positional change in a field of the horizontal component of each image, in FIG. 3, its horizontal pixel component is not shown nor is described.
As shown in FIG. 3, the pixel position of a chrominance component deviates from the pixel position in a field of a luminance component by a quarter vertical pixel. This is because relationship of pixel positions as shown in FIG. 2 is achieved when a frame is constructed from both Top and Bottom fields. If it is based on a NTSC format, each time interval between adjacent top and bottom fields (64a: 65a, 65a: 64b, etc.) is approximately 1/60 seconds. Each time interval between two consecutive top fields (64a: 64b, etc.) or between two consecutive bottom field (65a: 65b, etc.) are approximately 1/30 seconds.
Next, the frame predictive encoding mode of a field image and its field prediction, which is adopted in MPEG-2 and AVC FCD, are described.
FIG. 4 shows a method for constructing a frame using two consecutive fields (adjacent top and bottom fields) in a frame predictive mode.
As shown in FIG. 4, a frame is reconstructed by two time-consecutive fields (top and bottom fields).
FIG. 5 shows a frame predictive mode.
In FIG. 5 it is assumed that each frame, such as 84a, 84b, 84c, etc., is already reconstructed by two consecutive fields (top and bottom fields), as shown in FIG. 4. In this frame predictive mode, a frame to be encoded which is composed of top and bottom fields is encoded. As a reference image, one reference frame is constructed by two consecutive fields (top and bottom fields) stored for reference use, and is used to predict the target frame to be encoded. Then, these two frame images are encoded according to the process flow shown in FIG. 1. In the expression method of a motion vector of this frame predictive encoding mode, a zero vector, that is, (0,0) indicates a pixel located in the same spatial position. Specifically, the motion vector (0,0) of a luminance pixel 82 that belongs to frame#2 (84b) indicates the pixel position 81 of frame#1 (84a).
Next, a field predictive encoding mode is described.
FIG. 6 shows a predictive method in an inter-field predictive mode.
In a field predictive mode, an encoding target is one top field (94a, 94b, etc.) or bottom field (95a, 95b, etc.) that is inputted as an original image. As a reference image, a top field or bottom field that is stored before can be used. In this case, it is generally defined that the fact that an original image field parity and a reference field parity are the same means that the original image field and the reference field both are top fields or bottom fields. For example, in a prediction 90 between fields with the same parity shown in FIG. 6, an original image field (94b) and a reference field (94a) both are top fields. Similarly, it is generally defined that the fact that an original image field parity and a reference field parity are different means that one of original image and reference fields is a top field and the other is a bottom field. For example, in a prediction 91 between different parity fields shown in FIG. 6, the original image field is a bottom field (95a) and the reference field is a top field (94a). Then, these original image and reference fields are encoded according to the process flow shown in FIG. 1.
In the prior art, in both frame and field modes, a motion vector is calculated based on a pixel position in each frame/field. Here, a conventional motion vector calculation method and a conventional pixel corresponding method used when a motion vector is given are described.
FIG. 7 defines the coordinates of a frame/field image widely used in MPEG-2 coding, MPEG-1 coding, AVC FCD coding, etc. White circles in FIG. 7 are pixel definition positions in target frames/fields. In the coordinates of this frame/field image, the upper left corner is designated as the origin (0,0), and values 1, 2, 3, etc., are sequentially assigned to both horizontal and vertical pixel definition positions. Specifically, the coordinates of a pixel that are located at the n-th horizontal position and the m-th vertical position are (n,m). Similarly, the coordinates of a position interpolated among the pixels are also defined. Specifically, since a position 180 marked with a black circle in FIG. 7 is located at 1.5 pixels in the horizontal direction from the pixel located in the upper left corner and at 2 pixels in the vertical direction, the coordinates of the position 180 is expressed as (1.5, 2). In a field image, there are only a half of the pixels of a frame image in the vertical direction. However, even in this case, the coordinates of a pixel are defined in the same way as in FIG. 7, based on pixel positions located in each field.
Next, the definition of a motion vector between fields is described using the coordinate system shown in FIG. 7.
FIG. 8 shows a conventional calculation method of a motion vector between corresponding pixels between fields. The definition of a motion vector requires the position of a coding field and the position of a reference field. A motion vector is defined between these two points. Thus, a motion vector between a coding field coordinates 201 (Xs,Ys) and a reference field coordinates 202 (Xd,Yd) is calculated. In the conventional calculation method of a motion vector between pixels corresponding to between-fields, a motion vector is calculated by the same method described below, regardless of whether the coding field or reference field is a top field or a bottom field. Specifically, coding field coordinates 201 (Xs,Ys) and reference field coordinates 202 (Xd,Yd) are inputted to a motion vector calculation unit 200, and as a motion vector 203 between these two points, (Xd−Xs,Yd−Ys) is given.
FIG. 9 shows a conventional method for calculating a pixel that is pointed by a motion vector defined between fields. In this case, it is assumed that a motion vector is calculated by the method shown in FIG. 8. The calculation of reference frame/field coordinates requires a coding frame/field position and a motion vector. In the case shown in FIG. 9, it is assumed that a motion vector 211 (X,Y) is given for coding field coordinates 212 (Xs,Ys), and reference field coordinates can be calculated using both the motion vector 212 (X,Y) and the coding field coordinates 212 (Xs,Ys). In the conventional calculation method of a motion vector between corresponding pixels between fields, a reference field position is calculated by the same method described below, regardless of whether the coding field or reference field is a top field or a bottom field. Specifically, a motion vector 211 (X,Y) and coding field coordinates 212 (Xs,Ys) are inputted to a pixel corresponding unit 210, and as reference field coordinates 213, coordinates (Xs+X,Ys+Y) is given.
The definition of the relation between a vector and a pixel position applies to both a luminance component and chrominance component. In MPEG-1/MPEG-2/AVC FCD, which all are general motion picture encoding methods, only the vector of a luminance component is encoded, and the vector of a chrominance component is calculated by scaling down the luminance component. Particularly, in AVC FCD, since the number of vertical pixels and that of horizontal pixels of a chrominance component are a half of those of a luminance component, respectively, it is specified that a motion vector used to calculate the predictive pixel of a chrominance component should be obtained by accurately scaling down the motion vector of the luminance component to a half.
FIG. 10 shows a conventional method for calculating a chrominance motion vector using a luminance motion vector.
Specifically, if a luminance motion vector 221 and a chrominance motion vector 222 are (MV_x,MV_y) and (MVC_x, MVC_y), respectively, a chrominance motion vector generation unit 220 can calculate a chrominance motion vector 222 according to the following equation.(MVC—x,MVC—y)=(MV—x/2,MV—y/2)  (1)
This conventional calculation method can be used regardless of whether a motion vector is used for prediction between fields with the same parity or between fields with different parity.
In AVC FCD, as the accuracy of the motion vector of a luminance component, ¼ pixel accuracy can be applied. Therefore, as a result of equation (1), as the accuracy of the motion vector of a chrominance component, a vector having ⅛ pixel accuracy, that is, accuracy at the decimal fraction, can be used.
FIG. 11 shows the calculation method of the interpolated pixel of a chrominance component that is defined in AVC FCD.
In FIG. 11, a black circle and a white circle represent an integer pixel and an interpolated pixel, respectively. In this case, the horizontal coordinate of an interpolated pixel G(256) is obtained by internally dividing each horizontal coordinate between points A(250) and C(252) at a ratio α:1−α, and the vertical coordinate can be obtained by internally dividing each vertical coordinate between points A(250) and B(251) at β:1−β. In this case, α and β are a value between 0 and 1. An interpolated pixel G(256) defined by such positions can be roughly calculated as follows using integer pixels A(250), B(251), C(252) and D(253), which are located around the interpolated pixel G(256), and using α and β.G=(1−α)·(1−β)·A+(1−α)·β·B+α·(1−β)·C+α·β·D  (2)
The interpolated pixel calculation method of a chrominance component, using the method shown in FIG. 11 is just one example, and there is no problem in using another calculation method.
In the case of this field encoding mode, in a prediction in which an original image field and a reference field are different, that is, between fields with different parity, the respective zero vectors of the motion vector of a luminance component and that of a chrominance component are not parallel in the definition of AVC FCD. Specifically, if a prediction is made using the motion vector of a chrominance component calculated using the motion vector of a luminance component according to the conventional definition, a pixel located in a position spatially deviated from that of the luminance component is to be referenced. This fact is described below with reference to FIG. 12. In FIG. 12, it is assumed that a top field 130, a bottom field 131 and a top field 132 continue timewise. In this case, bottom field 131 is to be encoded using top field 130. In this inter-field encoding, the vertical motion vector in the same line of each field is defined to be zero. Therefore, if a zero vector (0,0) is assigned to a luminance pixel 133a that belongs to the second line of bottom field 131, this pixel can be predicted from a pixel 135a in top field 130. Similarly, when a zero vector (0,0) is assigned to a chrominance pixel 133a which belongs to the first line of the bottom field 131, this pixel is predicted from the pixel 137a which is in the first line of chrominance of the top field 130. Similarly, a luminance pixel 133b in the third line and a chrominance pixel 134b, which belong to top field 132 are predicted from pixels 135b in the third line of luminance and 137b in the second line of chrominance in bottom field 131, respectively. Since essentially it is preferable that a chrominance motion vector and a luminance motion vector are parallel, chrominance pixels 134a and 134b should be predicted from the positions 136a and 136b, respectively, if a luminance motion vector is as it is.
As described earlier, in a prediction between fields with different parity, the fact that the respective zero vectors of luminance and chrominance are not parallel is explained. In the case of AVC FCD, this fact causes the following problems for all vectors in a prediction between fields with different parity. FIGS. 13 and 14 show such problems. Problems in the case of AVC FCD are described below. In the explanation below, a horizontal component of a motion vector is set to zero in all cases for brevity.
FIG. 13 shows a conventional problem caused if a chrominance motion vector is conventionally calculated using a luminance motion vector when a reference field and a coding field are a bottom field and a top field, respectively. In AVC FCD, since, as is clear from equation (1), it is specified that the number of vertical and horizontal pixels of a chrominance component are a half of those of a luminance component, a motion vector used to calculate the predictive pixel of a chrominance should be scaled down to a half of the motion vector of a luminance component. This is regardless of whether a motion vector is used for predicttion between frames, between fields with the same parity or between fields with different parity.
It is shown below that this definition causes a problem when a chrominance motion vector is calculated using a luminance motion vector defined between fields with different parity. In FIG. 13, a coding field top field luminance pixel 140 in the first line has (0,1) as a predictive vector, and as a result, it points a bottom reference field luminance pixel position 141 in the second line as a predictive value.
In this case, a chrominance motion vector that belongs to the same block is calculated to be (0,½), according to equation (1). If a prediction is made using motion vector (0,½) as a predictive value of a coding field top field chrominance pixel 142 in the first line, a pixel position 143 is used as predicted value, which shifts downward by half a pixel from a pixel in the first line of a bottom reference field chrominance component.
In this case, a luminance motion vector (0,1) and a chrominance vector (0,½) are not parallel. It is preferable to use a bottom reference field chrominance predictive pixel position 145 to which a chrominance motion vector parallel to a luminance motion vector is applied.
FIG. 14 shows a conventional problem caused if a chrominance motion vector is calculated using a luminance motion vector when a reference field and a coding field are a top field and a bottom field, respectively. As described in FIG. 13, in FIG. 14, a bottom coding field luminance pixel 150 in the first line has (0,1) as a predictive vector, and as a result, it points a reference top field luminance pixel position 151 in the second line as a predictive value.
In this case, a chrominance motion vector that belongs to the same block is calculated to be (0,½), according to equation (1). If a prediction is made using motion vector (0,½) as a predictive value of a bottom coding field chrominance pixel 152, a pixel position 153 is used as predicted value which is shifted by half a pixel from a top reference field chrominance pixel position 153 in the first line.
In this case, a luminance motion vector (0,1) and a chrominance vector (0,½) are not parallel. It is preferable to use a top reference field chrominance predictive pixel position 155 to which a chrominance motion vector parallel to a luminance motion vector is applied.
As described above, if a reference field parity and a coding field parity are different, according to the conventional predictive method, a pixel located in the position of a luminance component spatially deviated from that of the chrominance component is to be referenced, and a predictive image, in which a pixel located in the position of a luminance component is spatially deviated from that of the chrominance component, is generated not only for a zero vector but for all the vectors. Note that, in the above explanation, vector are said to be parallel or not parallel by considering the case where the direction in time of a luminance motion vector and a chrominance motion vector, that is, time direction from coding field to reference field in included in a motion vector. The same is true below.