Generally, in coding of pictures constituting a moving picture, each picture is divided into plural blocks, and compressive coding (hereinafter, also referred to simply as “coding”) of image information possessed by each picture is carried out for every block, utilizing redundancies in the space direction and time direction of the moving picture. As a coding process utilizing redundancy in the space direction, there is intra-picture coding utilizing correlation of pixel values in a picture. As a coding process utilizing redundancy in the time direction, there is inter-picture predictive coding utilizing correlation of pixel values between pictures. The inter-picture predictive coding is a process of coding a target picture to be coded, with reference to a picture that is positioned timewise forward the target picture (forward picture), or a picture that is positioned timewise backward the target picture (backward picture).
The forward picture is a picture whose display time is earlier that that of the target picture, and it is positioned forward the target picture on a time axis indicating the display times of the respective pictures (hereinafter, referred to as “display time axis”). The backward picture is a picture whose display time is later than that of the target picture, and it is positioned backward the target picture on the display time axis. Further, in the following description, a picture to be referred to in coding the target picture is called a reference picture.
In the inter-picture predictive coding, specifically, a motion vector of the target picture with respect to the reference picture is detected, and prediction data for image data of the target picture is obtained by motion compensation based on the motion vector. Then, redundancy of difference data between the prediction data and the image data of the target picture in the space direction of the picture is removed, thereby to perform compressive coding for the amount of data of the target picture.
On the other hand, as a process for decoding a coded picture, there are intra-picture decoding corresponding to the intra-picture coding, and inter-picture decoding corresponding to the inter-picture coding. In the inter-picture decoding, the same picture as a picture that is referred to in the inter-picture coding is referred to. That is, a picture Xtg that is coded with reference to pictures Xra and Xrb is decoded with reference to the pictures Xra and Xrb.
FIGS. 43(a)-43(c) are diagrams illustrating plural pictures constituting a moving picture.
In FIG. 43(a), part of plural pictures constituting one moving picture Mpt, i.e., pictures F(k)˜F(k+2n−1) [k,n: integers], are shown. Display times t(k)˜t(k+2n−1) are set on the respective pictures F(k)˜F(k+2n−1). As shown in FIG. 43(a), the respective pictures are successively arranged from one having earlier display time on a display time axis X indicating display times Tdis of the respective pictures, and these pictures are grouped for every predetermined number (n) of pictures. Each of these picture groups is called a GOP (Group of Pictures), and this is a minimum unit of random access to coded data of a moving picture. In the following description, a picture group is sometimes abbreviated as a GOP.
For example, an (i)th picture group Gp(i) is constituted by pictures F(k)˜F(k+n−1). An (i+1)th picture group Gp(i+1) is constituted by pictures F(n+k)˜F(k+2n−1).
Each picture is divided into plural slices each comprising plural macroblocks. For example, a macroblock is a rectangle area having 16 pixels in the vertical direction and 16 pixels in the horizontal direction. Further, as shown in FIG. 43(b), a picture F(k+1) is divided into plural slices SL1˜SLm [m: natural number]. A slice SL2 is constituted by plural macroblocks MB1˜MBr [r: natural number] as shown in FIG. 43(c).
FIG. 44 is a diagram for explaining coded data of a moving picture, illustrating a structure of a stream obtained by coding the respective pictures constituting the moving picture.
A stream Smp is coded data corresponding to one image sequence (e.g., one moving picture). The stream Smp is composed of an area (common information area) Cstr wherein bit streams corresponding to common information such as a header are arranged, and an area (GOP area) Dgop wherein bit streams corresponding to the respective GOPs are arranged. The common information area Cstr includes sync data Sstr and a header Hstr corresponding to the stream. The GOP area Dgop includes bit streams Bg(1)˜Bg(i−1), Bg(i), Bg(i+1)˜Bg(I) corresponding to picture groups (GOP) Gp(1)˜Gp(i−1), Gp(i), Gp(i+1)˜Gp(I) [i,I: integers].
Each bit stream corresponding to each GOP is composed of an area (common information area) Cgop wherein bit streams corresponding to common information such as a header are arranged, and an area (picture area) Dpct wherein bit streams corresponding to the respective pictures are arranged. The common information area Cgop includes sync data Sgop and a header Hgop corresponding to the GOP. A picture area Dpct of the bit stream Bg(i) corresponding to the picture group G(i) includes bit streams Bf(k′), Bf(k′+1), Bf(k′+2), Bf(k′+3), . . . , Bf(k′+s) corresponding to pictures F(k′), F(k′+1), F(k′+2), F(k′+3), . . . , F(k′+s) [k′,s: integers]. The pictures F(k′), F(k′+1), F(k′+2), F(k′+3), . . . , F(k′+s) are obtained by rearranging, in coding order, the pictures F(k)˜F(k+n−1) arranged in order of display times.
Each bit stream corresponding to each picture is composed of an area (common information area) Cpct wherein bit streams corresponding to common information such as a header are arranged, and an area (slice area) Dslc wherein bit streams corresponding to the respective slices are arranged. The common information area Cpct includes sync data Spct and a header Hpct corresponding to the picture. For example, when the picture F(k′+1) in the arrangement in order of coding times (coding order arrangement) is the picture F(k+1) in the arrangement in order of display times (display order arrangement), the slice area Dslc in the bit stream Bf(k′+1) corresponding to the picture F(k′+1) includes bit streams Bs1˜Bsm corresponding to the respective slices SL1˜SLm.
Each bit stream corresponding to each slice is composed of an area (common information area) Cslc wherein bit streams corresponding to common information such as a header are arranged, and an area (macroblock area) Dmb wherein bit streams corresponding to the respective macroblocks are arranged. The common information area Cslc includes sync data Sslc and a header Hslc corresponding to the slice. For example, when the picture F(k′+1) in the coding order arrangement is the picture F(k+1) in the display order arrangement, the macroblock area Dmb in the bit stream Bs2 corresponding to the slice SL2 includes bit streams Bm1˜Bmr corresponding to the respective macroblocks MB1˜MBr.
As described above, coded data corresponding to one moving picture (i.e., one image sequence) has a hierarchical structure comprising a stream layer corresponding to a stream Smp as the coded data, GOP layers corresponding to GOPs constituting the stream, picture layers corresponding to pictures constituting each of the GOPs, and slice layers corresponding to slices constituting each of the pictures.
By the way, in moving picture coding methods such as MPEG (Moving Picture Experts Group)-1, MPEG-2, MPEG-4, ITU-T recommendation H.263, H.26L, and the like, a picture to be subjected to intra-picture coding is called an I picture, and a picture to be subjected to inter-picture predictive coding is called a P picture or a B picture.
Hereinafter, definitions of an I picture, a P picture, and a B picture will be described.
An I picture is a picture to be coded without referring to another picture. A P picture or B picture is a picture to be coded with reference to another picture. To be exact, a P picture is a picture for which either I mode coding or P mode coding can be selected when coding each block in the picture. A B picture is a picture for which one of I mode coding, P mode coding, and B mode coding can be selected when coding each block in the picture.
The I mode coding is a process of performing intra-picture coding for a target block in a target picture without referring to another picture. The P mode coding is a process of performing inter-picture predictive coding for a target block in a target picture with reference to an already-coded picture. The B mode coding is a process of performing inter-picture predictive coding for a target block in a target picture with reference to two already-coded pictures.
A picture to be referred to during the P mode coding or B mode coding is an I picture or a P picture other than the target picture, and it may be either a forward picture positioned forward the target picture or a backward picture positioned backward the target picture.
However, there are three ways of combining two pictures to be referred to during the B mode coding. That is, there are three cases of B mode coding as follows: a case where two forward pictures are referred to, a case where two backward pictures are referred to, and a case where one forward picture and one backward picture are referred to.
FIG. 45 is a diagram for explaining a moving picture coding method such as MPEG described above. FIG. 45 illustrates relationships between target pictures and the corresponding reference pictures (pictures to be referred to when coding the respective target pictures).
Coding of the respective pictures F(k)˜F(k+7), . . . , F(k+17)˜F(k+21) constituting the moving picture is carried out with reference to other pictures as shown by arrows Z. To be specific, a picture at the end of one arrow Z is coded by inter-picture predictive coding with reference to a picture at the beginning of the same arrow Z. In FIG. 45, the pictures F(k)˜F(k+7), . . . , F(k+17)˜F(k+21) are identical to the pictures F(k)˜F(k+4), . . . , F(k+n−2)˜F(k+n+4), . . . , F(k+2n−2), F(k+2n−1) shown in FIG. 43(a). These pictures are successively arranged from one having earlier display time on the display time axis X. The display times of the pictures F(k)˜F(k+7), . . . , F(k+17)˜F(k+21) are times t(k)˜t(k+7), . . . , t(k+17)˜t(k+21). The picture types of the pictures F(k)˜F(k+7) are I, B, B, P, B, B, P, B, and the picture types of the pictures F(k+17)˜F(k+21) are B, P, B, B, P.
For example, when performing B mode coding for the second B picture F(k+1) shown in FIG. 45, the first I picture F(k) and the fourth P picture F(k+3) are referred to. Further, when performing P mode coding for the fourth P picture F(k+3) shown in FIG. 45, the first I picture F(k) is referred to.
Although a forward picture is referred to in P mode coding of a P picture in FIG. 45, a backward picture may be referred to. Further, although a forward picture and a backward picture are referred to in B mode coding of a B picture in FIG. 45, two forward pictures or two backward pictures may be referred to.
Furthermore, in a moving picture coding method such as MPEG-4 or H.26L, a coding mode called “direct mode” may be selected when coding a B picture.
FIGS. 46(a) and 46(b) are diagrams for explaining inter-picture predictive coding to be performed with the direct mode. FIG. 46(a) shows motion vectors to be used in the direct mode.
In FIG. 46(a), pictures P1, B2, B3, and P4 correspond to the pictures F(k+3)˜F(k+6) [k=−2] shown in FIG. 45, and times t(1), t(2), t(3), and t(4) (t(1)<t(2)<t(3)<t(4)) are display times of the pictures P1, B2, B3, and P4, respectively. Further, X is a display time axis indicating display times Tdis.
Hereinafter, a case where a block BL3 in the picture B3 is coded in the direct mode will be specifically described.
In this case, a target picture to be coded is the picture B3, and a target block to be coded is a block BL3.
In predictive coding of the block BL3 in the picture B3, a motion vector MV4 of a block BL4 in the picture P4, which block has been most-recently coded and is positioned backward the picture B3, is used. The relative position of the block BL4 to the picture P4 is equal to the relative position of the block BL3 to the picture B3. That is, as shown in FIG. 46(b), coordinates (x4,y4) of an origin Ob4 of the block BL4 with respect to an origin O4 of the picture P4 are equal to coordinates (x3,y3) of an origin Ob3 of the block BL3 with respect to an origin O3 of the picture P3. Further, the motion vector MV4 of the block BL4 is the motion vector that is used in predictive coding of the block BL4. The motion vector MV4 of the block BL4 is obtained by motion detection of the block BL4 with reference to the forward picture P1, and it shows a region R4f corresponding to the block BL4, of the forward picture P1.
Then, the block BL3 in the picture B3 is subjected to bidirectional predictive coding with reference to the forward picture P1 and the backward picture P4, by using motion vectors MV3f and MV3b which are parallel to the motion vector MV4. The motion vector MV3f indicates a region R3f corresponding to the block BL3, of the forward picture P1 to be referred to when coding the block BL3. The motion vector MV3b indicates a region R3b corresponding to the block BL3, of the backward picture P4 to be referred to when coding the block BL3.
By the way, the ITU-T recommendation (H.263++Annex U) describes about a framework in a case where plural pictures are used as candidates for a reference picture. In this description, a reference picture memory for holding image data of pictures to be candidates for a reference picture (candidate pictures) is sorted into a short-term picture memory and a long-term picture memory. The short-term picture memory is a memory area for holding data of candidate pictures which are timewise close to a target picture (neighboring candidate pictures). The long-term picture memory is a memory area for holding candidate pictures which are timewise far from the target picture (distant candidate pictures). To be specific, a distant candidate picture is apart from the target picture by such a distance that the number of candidate pictures from the target picture to the distant candidate picture exceeds the number of candidate pictures which can be stored in the short-term picture memory.
Further, the ITU-T recommendation (H.263++Annex U) describes about a method of utilizing the short-term picture memory and the long-term picture memory, and further, it also describes a method of designating reference picture indices (hereinafter, also referred to simply as reference indices) to pictures.
Initially, the method of designating reference indices to pictures will be briefly described.
FIGS. 47(a) and 47(b) are diagrams for explaining the method of designating reference indices to plural pictures constituting a moving picture. FIG. 47(a) shows candidates (candidate pictures) for a picture to be referred to when coding a picture P16. FIG. 47(b) shows candidates (candidate pictures) for a picture to be referred to when coding a picture B15.
In FIG. 47(a), pictures P4, B2, B3, P7, B5, B6, P10, B8, B9, P13, B11, B12, P16, B14, B15, P19, B17, and P18 are obtained by rearranging the pictures F(k+1)˜F(k+17) [k=1] shown in FIG. 45 in cording order. The arrangement of plural pictures shown in FIG. 47(a) is an arrangement of pictures on a time axis (coding time axis) Y indicating times (coding times) Tenc for coding the respective pictures.
A description will be given of a case where, as shown in FIG. 47(a), a block in the P picture P16 is subjected to P mode coding.
In this case, among four forward P pictures (pictures P4, P7, P10, and P13), a picture suited for coding is referred to. That is, the forward P pictures P4, P7, P10, and P13 are candidate pictures which can be designated as a reference picture in performing P mode coding of the picture P16. These candidate pictures P4, p7, P10, and P13 are assigned reference indices, respectively.
When assigning reference indices to these candidate pictures, a reference index having a smaller value is assigned to a candidate picture closer to the target picture P16 to be coded. To be specific, as shown in FIG. 47(a), reference indices [0], [1], [2], and [3] are assigned to the pictures P13, P10, P7, and P4, respectively. Further, information indicating the reference indices assigned to the respective candidate pictures is described as a parameter of motion compensation in a bit stream corresponding to a target block in the picture p16.
Next, a description will be given of a case where, as shown in FIG. 47(b), a block in the B picture B15 is subjected to B mode coding.
In this case, among four forward pictures (pictures P4, P7, P10, and P13) and one backward picture (picture P16), two pictures suited for coding are referred to. That is, the forward pictures P4, P7, P10, and P13 and the backward picture P16 are candidate pictures which can be designated as reference pictures in B mode coding for the B picture B15. When four forward pictures and one backward picture are candidate pictures, the forward pictures P4, P7, P10, and P13 are assigned reference indices, and the backward picture P16 is assigned a code [b] indicating that this picture is a candidate picture to be referred to backward.
In assigning reference indices to the candidate pictures, as for forward pictures as candidate pictures, a smaller reference index is assigned to a forward picture (candidate picture) closer to the target picture B15 to be coded on the coding time axis Y. To be specific, as shown in FIG. 47(b), reference indices [0], [1], [2], and [3] are assigned to the pictures P13, P10, P7, and P4, respectively. Further, information indicating the reference index assigned to each candidate picture is described, as a parameter of motion picture, in a bit stream corresponding to a target block in the picture B15.
Next, the method of assigning reference indices, which is described in the ITU-T recommendation (H.263++Annex U), will be described in association with the method of utilizing the short-term picture memory and the long-term picture memory.
In the short-term picture memory, candidate pictures which can be designated as a reference picture for a target picture are successively stored, and the stored candidate pictures are assigned reference index in order of storage into the memory (i.e., in decoding order, or in order of bit streams). Further, when decoding a B picture, a picture that has most-recently been stored in the memory is treated as a backward reference picture while the other pictures are assigned reference indices in order of storage into the memory.
Hereinafter, a description will be given of a case where four forward pictures can be used as candidates for a reference picture for a target picture.
FIGS. 48(a) and 48(b) are diagrams illustrating part of plural pictures constituting a moving picture, wherein pictures are arranged in display order (48(a)), and pictures are arranged in coding order (48(b)). Pictures P1, B2, B3, P4, B5, B6, P7, B8, B9, P10, B11, B12, P13, B14, B15, P16, B17, B18, and P19 shown in FIG. 48(a) correspond to the pictures F(k+3)˜F(k+21) [k=−2] shown in FIG. 45.
FIG. 49 is a diagram for explaining management of a memory for reference pictures for the pictures arranged as described above.
In FIG. 49, already-coded pictures which are stored in the reference picture memory when coding target pictures are shown in association with logical memory numbers corresponding to memory areas where the already-coded pictures are stored, and reference indices assigned to the already-coded pictures.
In FIG. 49, pictures P16, B14, and B15 are target pictures. Logical memory numbers (0)˜(4) indicate logical positions (memory areas) in the reference picture memory. The later the time of coding (or decoding) an already-processed picture stored in a memory area is, the smaller the logical memory number corresponding to the memory area is.
Hereinafter, management of the reference picture memory will be described more specifically.
When coding (decoding) the picture P16, the pictures P13, P10, P7, and P4 are stored in the memory areas indicated by the logical memory numbers (0), (1), (2), and (3) in the reference picture memory, respectively. The pictures P13, P10, P7, and P4 are assigned reference indices [0], [1], [2], and [3], respectively.
When coding (decoding) the pictures B14 and B15, the pictures P16, P13, P10, P7, and P4 are stored in the memory areas indicated by the logical memory numbers (0), (1), (2), (3), and (4) in the reference picture memory, respectively. At this time, the picture P16 is assigned a code [b] indicating that this picture is a candidate picture to be backward referred to, and the remaining candidate pictures p13, P10, P7, and P4 to be forward referred to are assigned reference indices [0], [1], [2], and [3], respectively.
Information indicating the reference indices assigned to the respective candidate pictures is a parameter of motion compensation and, when coding a block in a target picture, it is described in a bit stream corresponding to the block as information indicating which one of the plural candidate pictures should be used as a reference picture. At this time, a shorter code is assigned to a smaller reference index.
In the conventional coding method described above, however, since an I picture or a P picture is designated as a reference picture when performing predictive coding for a block in a B picture, a distance (hereinafter, also referred to as a time-basis distance) between the target picture and the reference picture on the display time axis might be increased.
For example, in predictive coding on a block in the B picture B15 shown in FIG. 48(b), when the forward picture P13 and the backward picture P16 are designated as reference pictures, the time-basis distance Ltd (=t(15)−t(13)) between the B picture B15 (target picture) and the forward picture P13 (reference picture) becomes a two-picture interval (2Pitv) as shown in FIG. 50(a).
Furthermore, in predictive coding for a block in the B picture B15 shown in FIG. 48(b), when the forward pictures P13 and P10 are designated as reference pictures, the time-basis distance Ltd (=t(15)−t(10)) between the B picture B15 (target picture) and the forward picture P10 (reference picture) becomes a five-picture interval (5Pitv) as shown in FIG. 50(b).
Especially when the number of B pictures inserted between an I picture and a P picture or between adjacent two P pictures is increased, the time-basis distance Ltd between the target picture and the reference picture is increased, resulting in a considerable reduction in coding efficiency.
Further, in the conventional coding method, when performing B mode coding in which plural backward pictures can be referred to, there are cases where a neighboring picture which is timewise close to a target picture is assigned a reference index larger than a reference index assigned to a distant picture which is timewise far from the target picture.
In this case, in motion detection for a block in the target picture, a candidate picture that is timewise closer to the target picture is likely to be referred to, in other words, a candidate picture that is timewise closer to the target picture is likely to be designated as a reference picture, resulting in degradation of coding efficiency.
Hereinafter, a description will be given of a case where two backward pictures P16 and p19 are referred to in B mode coding for a block in a B picture B15 shown in FIG. 51(a).
In this case, pictures B2, B3, P4, B5, B6, P7, B8, B9, P10, B11, B12, P13, B14, B15, P16, B17, B18, and P19 which are arranged in display order as shown in FIG. 51(a) are rearranged in coding order, resulting in P7, B2, B3, P10, B5, B6, P13, B8, B9, P16, B11, B12, P19, B14, and B15 as shown in FIG. 51(b).
Further, in this case, among three forward pictures (pictures P7, P10, and P13) and two backward pictures (pictures P16 and P19), two pictures suited to coding are referred to. To be specific, the forward pictures p7, P10, and P13 and the backward pictures P16 and P19 are candidate pictures which can be designated as a reference picture when coding a block in the picture B15. When three forward pictures and two backward pictures are candidate pictures as described above, reference indices are assigned to the forward pictures P7, P10, and P13 and the backward pictures P16 and P19.
In assigning reference indices to the candidate pictures, a smaller reference index is assigned to a candidate picture that is closer to the target picture B15 to be coded on the coding time axis Y. To be specific, as shown in FIG. 51(b), reference indices [0], [1], [2], [3], and [4] are assigned to the pictures P19, P16, P13, P10, and P7, respectively.
In this case, however, the reference index [1] assigned to the P picture P16 that is closer to the target picture (B picture B15) on the display time axis X becomes larger than the reference index [0] assigned to the P picture P19 that is far from the B picture B15, resulting in degradation of coding efficiency.
The present invention is made to solve the above-described problems and has for its object to provide a moving picture coding method which can prevent a reduction in coding efficiency due to an increase in a time-basis distance between a target picture and a reference picture, and a moving picture decoding method corresponding to the moving picture coding method which can prevent a reduction in coding efficiency.
Further, it is another object of the present invention to provide a moving picture coding method which can assign reference indices to candidate pictures that can be referred to in predictive coding, without degrading coding efficiency, and a moving picture decoding method corresponding to the moving picture coding method which can avoid degradation in coding efficiency.