(1) Field of the Invention
The present invention relates to a moving picture coding method for coding a moving picture signal on a picture-by-picture basis, a moving picture decoding method for decoding the coded moving picture signal, and a program for executing these methods as software.
(2) Description of the Related Art
Recently, with an arrival of the age of multimedia which handles integrally audio, video and pixel values, existing information media, i.e., newspaper, journal, TV, radio and telephone and other means through which information is conveyed to people, has come under the scope of multimedia.
In general, multimedia refers to a representation in which not only characters but also graphic symbols, audio and especially pictures and the like are related to each other. However, in order to include the aforementioned existing information media in the scope of multimedia, it appears as a prerequisite to represent such information in digital form.
However, when estimating the amount of information contained in each of the aforementioned information media in digital form, the information amount per character requires 1˜2 bytes whereas audio requires more than 64 Kbits per second (telephone quality) and when it comes to a moving picture, it requires more than 100 Mbits per second (present television reception quality). Therefore, it is not realistic to handle the vast information directly in digital form via the information media mentioned above. For example, a videophone has already been put into practical use via Integrated Services Digital Network (ISDN) with a transmission rate of 64 Kbits/s˜1.5 Mbits/s, however, it is impossible to transmit a picture captured on the TV screen or shot by a TV camera.
This therefore requires information compression techniques, and for instance, in the case of a videophone, video compression techniques compliant with H.261 and H.263 Standards recommended by International Telecommunication Union-Telecommunication Standardization Sector (ITU-T) are employed. According to the information compression techniques compliant with MPEG-1 standard, picture information as well as audio information can be stored in an ordinary music CD (Compact Disc).
Here, Moving Picture Experts Group (MPEG) is an international standard for a compression of moving picture signals and the MPEG-1 is a standard that compresses video signals down to 1.5 Mbit/s, namely, to compress the information included in TV signals approximately down to a hundredth. The quality targeted in the MPEG-1 standard was a medium one so as to realize a transmission rate primarily of about 1.5 Mbits/s, therefore, MPEG-2, standardized with the view to meet the requirements of even higher quality picture, realizes a TV broadcast quality for transmitting moving picture signals at a transmission rate of 2˜15 Mbits/s. In the present circumstances, a working group (ISO/IEC JTC1/SC29/WG11) previously in charge of the standardization of the MPEG-1/MPEG-2 has further standardized MPEG-4 which achieves a compression rate superior to the one achieved by the MPEG-1/MPEG-2, allows coding/decoding operations on a per-object basis and realizes a new function required by the age of multi media. At first, in the process of the standardization of the MPEG-4, the aim was to standardize a low bit rate coding, however, the aim is presently extended to a more versatile coding including a high bit rate coding for interlaced pictures and others. Moreover, a standardization of MPEG-4 AVC and ITU H.264, as a next generation coding method, is in process with a higher compression rate, jointly worked by the ITU-T and the ISO/IEC. The next generation coding method is published under the name of Committee Draft (CD) as of August 2002 (see reference, for example, to ISO/IEC 14496-10 Editor's Proposed Changes Relative to JVT-E146d37ncm, revision 4, 2002-12).
In general, in coding of a moving picture, compression of information volume is performed by eliminating redundancy both in spatial and temporal directions. Therefore, an inter-picture prediction coding, which aims at reducing the temporal redundancy, estimates a motion and generates a predictive picture on a block-by-block basis with reference to forward and backward pictures, and then codes a differential value between the obtained predictive picture and a current picture to be coded. Here, “picture” is a term to represent a single screen and it represents a frame when used for a progressive picture whereas it represents a frame or a field when used for an interlaced picture. The interlaced picture here is a picture in which a single frame consists of two fields respectively having different time. For coding and decoding an interlaced picture, three ways are possible: processing a single frame either as a frame, as two fields or as a frame/field structure depending on a block in the frame.
A picture to which an intra-picture prediction coding is performed without reference pictures is called an “I-picture”. A picture to which the inter-picture prediction coding is performed with reference to a single picture is called a “P-picture”. A picture to which the inter-picture prediction coding is performed by referring simultaneously to two pictures is called a “B-picture”. The B-picture can refer to two pictures, arbitrarily selected from the pictures whose display time is either forward or backward to that of a current picture to be coded, as an arbitrary combination. The reference pictures can be specified for each block which is a basic unit for coding and decoding, but they can be classified as follows: the first reference picture for a reference picture that is described first in the bit stream on which coding is performed; and the second reference picture for a picture that is described later. However, the reference pictures need to be already coded or decoded as a condition to code or decode these I, P and B pictures.
A motion compensation inter-picture prediction coding is employed for coding P-pictures or B-pictures. The motion compensation inter-picture prediction coding is a coding method in which motion compensation is applied to inter-picture prediction coding. The motion compensation is not a method to simply predict motions using pixels in the reference picture but to estimate a motion (to be referred to as a “motion vector” hereinafter) at each part within a picture and improve accuracy in prediction by performing prediction that takes a motion vector into consideration as well as to reduce the data amount. For example, the amount of data is reduced by estimating a motion vector for a current picture to be coded and coding prediction error between a predictive value, which is obtained after being shifted for the amount equivalent to the motion vector, and the current picture. In the case of using this method, information on motion vectors is required at the time of decoding, therefore, the motion vectors are coded and then recorded or transmitted.
The motion vector is estimated on a macroblock-by-macroblock basis. To be precise, the motion vector is estimated by fixing a macroblock in the current picture, shifting a macroblock in the reference picture within a search range and then finding out a location of the reference block that resembles a basic block the most.
FIG. 1 is a block diagram showing the structure of the conventional moving picture coding apparatus.
The moving picture coding apparatus is composed of a motion estimation unit 103, a subtracter 104, a coding unit 105, a motion compensation unit 106, a variable length coding unit 107, a decoding unit 108, an adder 109 and memories 110, 111.
The moving picture signal Vin is inputted to the subtracter 104 and the motion estimation unit 103.
The motion estimation unit 103 uses decoded picture data which is already coded and read out from the memory 110 as a reference picture, estimates a motion vector MV indicating a location predicted as optimal in a search range within the reference picture, and outputs it to the motion compensation unit 106.
The motion compensation unit 106 generates a motion compensated image signal MCRef using the motion vector MV estimated by the motion estimation unit 103 and outputs it to the subtracter 104 and the adder 109.
The subtracter 104 calculates a differential between the inputted moving picture signal Vin and the motion compensated image signal MCRef inputted from the motion compensation unit 106 and outputs a differential signal Dif to the coding unit 105.
The coding unit 105 performs coding processing such as frequency transformation, quantization, and others on the inputted differential signal Dif, generates a coded signal and outputs it to the variable length coding unit 107 and the decoding unit 108. The variable length coding unit 107 performs variable length coding or the like on the inputted coded signal, further generates a coded stream Str by adding the motion vector MV inputted from the motion compensation unit 106, or the like, and outputs it outside the moving picture coding apparatus.
The decoding unit 108 performs decoding processing such as inverse quantization, inverse frequency transformation and others on the inputted coded signal and outputs the decoded differential signal RecDif to the adder 109.
The adder 109 adds the differential signal RecDif inputted from the decoding unit 108 and the picture signal RecMCRef inputted from the motion compensation unit 106 and generates a decoded local picture LocalRecon. The generated decoded local picture is outputted to the memory 111.
The decoded local picture LocalRecon is a picture which corresponds to the result of decoding operated by a moving picture decoding apparatus and is used as a reference picture when the moving picture signal Vin to be coded next in coding order is coded. Therefore, either the decoded local picture LocalRecon written in the memory 111 is copied in the memory 110 by the time when the following moving picture signal Vin is inputted or the contents are exchanged between the memories 110 and 111
FIG. 2 shows concepts of display order information (Picture Order Count: POC) and a frame number (Frame Number: FN) defined in JVT. The display order information POC indicates an order in which the pictures are displayed. However, it does not indicate an actual display time. For example, the display order information POC of a picture IDR19 in the diagram indicates “0” while the POC of the following picture B20 indicates “1”. This shows that the picture B20 is to be displayed following the picture IDR19 but does not show the period of time that passes until the picture is displayed. The actual display time can be obtained from data associated with each of the pictures and is managed by an apparatus which functions independently from a video decoder (a moving picture decoding apparatus). The display order information POC is always reset to “0” at an IDR picture, a special intra picture, and is assigned to each of the pictures so that the value increases picture by picture in display order. The POC is reset again to “0” when the value reaches to a predetermined maximum value. The example in the diagram shows that the display order information POC returns to the value “0” at IDR pictures IDR19, IDR29, and also at the picture B24 after having completed the cycle, when the maximum value of the display order information POC is set to “4”.
The FN is the numbers to be assigned to the pictures which may be referred to later on. (A) in the diagram shows a state of the memory before the picture B21 is decoded, where three reference pictures are stored. (B) in the diagram shows the state after the picture B21 is decoded and stored in the memory. Here, the FN of the picture B21 has the same value as that of a picture P25 which is to be decoded next. However, when plural consecutive pictures in decoding order have the same FN, this means that the last picture but not the other pictures in decoding order is a reference picture. According to this example, the picture B21 is not a reference picture, therefore, when being stored in the memory, the picture B21 is marked as “unused as a reference picture” (the state of being marked is presented as “unused”). When a reference picture is stored in the memory, it is marked as “used as a reference picture” (the state of being marked is presented as “used”). It should be noted that only the “unused” mark is indicated in the diagram. Whether or not a picture is a reference picture is indicated in a field called “nal13 ref13 idc” included in the coded stream, however, this is not explained here since it does not relate directly to the description of the present invention. The frame number FN is also reset to “0” at IDR pictures as well as when the value reaches the predetermined maximum value, as is the case of the display order information POC. The example shows that the FN is reset to “0” at the pictures IDR19 and IDR29 as well as at the picture B24.
Next, the operation of removing pictures with the view to allocate a free space in the memory is explained with reference to FIGS. 3 and 4. FIG. 3 shows the removing operation in the case where the memory has the picture marked as “unused”. The decoded pictures IDR19, P22, B20 and B21 are stored in the memory immediately before the picture P23 is decoded while the picture B20 is already marked as “unused” since it is not a reference picture (see (A) in the diagram). Then, the pictures are marked as “unused”, if necessary, using a method such as a memory management control operation (MMCO) or a sliding window which determines a picture, which is stored in the memory at the earliest time, as unnecessary. These operations are called unused marking processing in the present specification. Here, the picture P22 is marked as “unused” (see reference to (B) in the diagram). Then, a picture is removed in order to allocate a free space. When plural used pictures are stored in the memory as shown in (B), the picture located in the earliest position in display order (POC) is removed. In this case, the display order information of the picture P22 indicates “3” while the display order information of the picture B20 indicates “1”, therefore, the picture B20 is removed (see reference to (C) in the diagram). The picture P23 is stored in the area released as a result of the removing (see reference to (D) in the diagram).
It should be noted that a picture includes a frame and a field. Although the term “picture” is employed in the present specification, a picture may be stored in the memory on a frame-by-frame basis (an odd field and an even field of the same time). Similarly, a picture may be removed on a frame-by-frame basis for allocating a free space in the memory.
It should be also noted that the number presented with the term “stage” in the diagram indicates a stage of transition of the memory. The stage 1 is a stage before the unused marking processing is operated in processing the picture (a picture P23 in this case) and the stage 2 is a stage after the unused marking processing is operated while the stage 3 is a stage after a free space is allocated and the stage 4 is a stage after the picture (P23) is stored.
FIG. 4 shows the removing operation when the memory does not have the picture marked as “unused”. As shown in the diagram, the pictures are decoded in order as follows: ID19, P22, B20, B21 and P23. As shown in (A) in the diagram, the pictures IDR19, P22, B20 and B21 are stored in the memory at the stage before the picture P23 is decoded and any of these pictures is not marked as “unused”. Then, as shown in (B) in the diagram, it is assumed that any of the pictures are not marked as “unused” in the unused marking processing. As in this case, when the memory has no such pictures marked as “unused”, the picture firstly decoded out of all the pictures stored in the memory is removed in order to allocate a free space. As shown in (C) in the diagram, the picture IDR19 is removed here since it is the picture firstly decoded among the pictures stored in the memory. Lastly, as shown in (D) in the diagram, the decoded picture P23 is stored in the released area.
FIG. 5 is a block diagram showing the structure of the conventional moving picture decoding apparatus.
The moving picture decoding apparatus is composed of a variable length decoding unit 402, a picture decoding unit 202, an MMCO decoding unit 204, a memory 206 and a memory management unit 401.
The variable length decoding unit 402 performs variable length decoding on the inputted coded moving picture signal Str while the picture decoding unit 202 decodes the coded picture data comp_pic and stores a decoded picture signal Recon in the memory 206. The picture decoding unit 202 outputs motion information MV to the memory 206, generates a motion compensated reference picture MCPic and performs motion compensation in decoding a picture that is inter-picture predictive coded. The memory management unit 401 outputs instructions mctrl for managing the memory such as determination of an area for storing a picture, retention of a free space, and others. The display order information POC is outputted from the variable length decoding unit 402 to the memory management unit 401 and is kept there. An MMCO command MMCO, which is one of the unused marking processing as mentioned above, is inputted from the variable length decoding unit 402 to the MMCO decoding unit 204, and decoded while the instruction to mark “unused” is inputted to the memory management unit 401. A decoded picture signal Vout to be displayed is outputted from the memory 206.
FIG. 6 is a flowchart showing the memory-related operation performed by the conventional moving picture decoding apparatus.
The present flow shows the operation on a picture-by-picture basis in Steps S1 through S2. The moving picture decoding apparatus performs unused marking processing and marks “unused”, if necessary, for each of the pictures stored in the memory (Step S13). The moving picture decoding apparatus then performs processing of allocating a free space, allocates a free space in the memory (Step S14) and stores the decoded picture signal Vout in the free space (Step S15).
FIG. 7 is a flowchart showing the operation for allocating a free space performed by the conventional moving picture decoding apparatus and explaining in detail Step S14 in FIG. 6. In the processing of allocating a free space (Step S14), the moving picture decoding apparatus examines whether or not the memory 206 has the picture marked as “unused” (Step S141). When the memory 206 has such picture, a picture, which is displayed at the earliest time among the pictures marked as “unused” stored in the memory 206, is removed (Step S143). When the memory 206 does not have any such pictures, the picture that is firstly decoded among the pictures stored in the memory 206 is removed (Step S142).
FIG. 9 is a conceptual diagram describing the operation of invalid picture processing. The JVT defines the operation of the memory management that when a part of the sequence of pictures inputted by the moving picture decoding apparatus is lost, invalid pictures are inserted for the number corresponding to the lost pictures. This operation is performed by the moving picture decoding apparatus when “required_frame_num_update_behaviour_flag” included in the sequence parameter set indicates “1”. The invalid picture is a specially marked picture without having an actual decoded picture signal, and cannot be used for reference. It is assumed that the status of the memory after having decoded the pictures I19, P20, P22 and P23 is as shown in (A) of the diagram. When coding the following picture B24, a reference index used for specifying a reference picture is assigned in such manner that a small value of the reference index ref_idx is assigned to the picture that is decoded at the latest time in decoding order and not marked as “unused”. The assignment of the reference index as described above is only an example and the way of assigning differs depending on a picture type or the like while the dependent nature of assigning the index of reference relations with dependency on the pictures stored in the memory is the same. In the example shown in the diagram, “ref_idx=0” is assigned to the picture P22 which is decoded lastly and is not marked “unused” while “ref_idx=1” is assigned to the picture P21 which is decoded immediately before the picture P22 and is not marked as “unused”.
Here, when the pictures P21 and P23 are lost during the transmission, or in other cases, and are not inputted in the decoder, the reference indices ref_idx are assigned to the reference pictures, as shown in (B) in the diagram, for decoding the picture B24, unless the invalid pictures are inserted. Basically, “ref_idx=0” and “ref_idx=2” are respectively assigned to the pictures P22 and P20 which are referred to by the picture B24. Since “ref_idx=0” is assigned to the picture P22 and “ref_idx=2” is assigned to the picture I19, it is a problem that the picture I19 instead of the picture P20 might be referred to by mistake. In order to avoid this, the invalid pictures are inserted.
(C) in the diagram shows a state of the memory before the picture B24 is decoded in the case in which the invalid pictures are inserted. When the non-sequentiality in frame numbers FN is detected, the invalid pictures are inserted for the number corresponding to the number of the lost pictures. In the example, when the picture P22 with the FN indicating “3” is decoded, the FN of the picture P20, which is decoded immediately before the picture P22, indicates “1”. The number thus increases by 2 while it increases normally by 1, which shows that one picture is lost. Therefore, one invalid picture is inserted before the picture P22 is decoded. The invalid picture as described above being a special picture is marked as “used” although it does not have an actual decoded picture signal and is processed as a reference picture at the time of assigning the reference indices to the pictures. The invalid picture, however, is further marked as “non-existent (non-exist)” because it shall not be used actually for reference.
FIG. 10 is a block diagram showing the structure of the conventional moving picture decoding apparatus. The moving picture decoding apparatus includes an FN gap detection unit 211 and the memory management unit 412 operates differently, which are different from the moving picture decoding apparatus described in FIG. 5. The FN gap detection unit 211 obtains a frame number FN from the variable length decoding unit 411, and instructs the memory management unit 412 to insert the invalid pictures for the required number, when the gap is detected. The memory management unit 412 stores, in the memory 206, the invalid pictures for the number instructed by the FN gap detection unit 211.
FIG. 11 is a flowchart showing the invalid picture processing operated by the conventional moving picture decoding apparatus. The difference in the memory-related operation between the present moving picture decoding apparatus and the one described in FIG. 6 is that the former examines a gap between the frame numbers FN (Step S11) before the unused marking processing takes place (Step S13). When the gap is detected, the moving picture decoding apparatus proceeds to the unused marking processing (Step S13) after having stored the invalid pictures for the number corresponding to the number of the missing pictures. When the gap is not detected, the moving picture decoding apparatus proceeds directly to the unused marking processing (Step S13). In Step S12, the moving picture decoding apparatus stores the invalid pictures for the number of the missing pictures in the same manner as the normal procedure used in storing a picture shown in FIG. 6.
FIG. 13 is a conceptual diagram showing the conventional structure of the stream according to MPEG-2. As shown in the diagram, the stream according to MPEG-2 has a layered system. The stream is made up of a plurality of Group Of Pictures (GOP). It is possible to edit a moving picture and to perform random access on it by using the GOP as a basic unit used in coding processing. The Group Of Picture consists of a plurality of pictures, each being I-picture, P-picture and B-picture. The stream, GOP and picture respectively include a synchronous signal (sync) indicating a boundary between respective units and a header that is data commonly included in the respective units. In MPEG-2, P-picture can be predictively coded with reference to one picture, either I-picture or P-picture, whose display time immediately precedes that of the P-picture. B-picture can be predictively coded with reference to one picture whose display time immediately precedes the B-picture or one picture whose the display time immediately follows the B-picture, both of which can be either I-picture or P-picture. The position of the B-picture is arranged in the stream, either immediately after I-picture or P-picture. Therefore, at the time of performing random access, all the pictures which are located after I-picture can be decoded and displayed, when decoding starts from I-picture. Also, the degree of allowance for the reference structure has been limited since the memory can store, at maximum, two reference pictures.
FIG. 14 is a conceptual diagram showing the conventional moving picture coding method defined in the JVT. According to the JVT, it is possible to refer to an arbitrary distant picture as long as it does not go across the special IDR picture. Therefore, it is possible, for example, to code many pictures by rearranging the coding order with the view to enhance the coding efficiency. In the diagram, the correlativity among the pictures 19, 20, 21, 25, 26 and 27 is very strong as well as among the pictures 22, 23, 24, 28, 29 and 30. In this case, the coding efficiency can be improved by inter-picture coding firstly the pictures 19, 20, 21, 25, 26 and 27 (GOP1) and then the pictures 22, 23, 24, 28, 29 and 30 (GOP2).
FIG. 15 is a flowchart showing the operation performed in the conventional moving picture coding method defined in the JVT. According to the moving picture coding method defined in the JVT, all the uncoded pictures can be the candidate pictures for coding (Step S55). Then, a picture is selected from the candidate pictures for coding based on certain criteria (Step S56). For example, when the number of uncoded pictures is ten, all the ten pictures may be determined as candidate pictures for coding and the tenth picture in display order may be selected for coding. After the coding, when an uncoded picture is still found, the procedure returns to Step S55. In Step 56, another uncoded picture may be awaited for input instead of coding processing.
By the way, the conventional moving picture decoding apparatus including the moving picture decoding apparatus as such have been unable to edit a coded stream except for IDR pictures, that is, special intra pictures. The following describes this problem.
FIG. 8 is a conceptual diagram for explaining the problem that the non-sequentiality in the sequence generates a non-sequentiality in the display order information POC and thereby removes a picture which is not displayed yet. The diagram shows the case in which a sequence is decoded after having combined the two parts, Clip1 and Clip2. A place where the non-sequentiality in the sequence generated by such editing or for other reasons is called an editing point. In this example, the maximum value of the display order information POC is set so that the circulation of the values indicated by the display order information POC does not need to be considered. (A) in the diagram shows a state of the memory after the Clip1 is decoded and the memory stores the pictures I19, P22, B20 and B21. The respective display order information POC indicate “4”, “7”, “5” and “6” while it is assumed that the pictures I19, B20 and B21 are marked as “unused”. (B) in the diagram shows a state of the memory after the first picture in the Clip2, I85, is decoded, but before the second picture P86 is decoded. Here, it is assumed that the picture I85 is stored in the position where the picture B20 has been stored. Subsequently, it is assumed that the picture I85 in the Clip2 is marked as “unused” in the unused marking processing (see (B) in the diagram). Then, in the following processing of allocating a free space, the picture I85 is removed since the picture located in the earliest position in display order is to be removed out of the pictures marked “unused”. Here, assuming that an average number of pictures which are delayed for the display after the decoding is three, the pictures B21, P22 and I85 are not displayed yet. However, the picture I85 is removed in spite that it is not displayed yet.
FIG. 12 is a conceptual diagram for explaining a problem that the non-sequentiality in the sequence generates a non-sequentiality in the frame number FN and that an invalid picture removes the picture which is not displayed yet. The example shows how a sequence is decoded after having combined non-sequential parts Clip1 and Clip2. (A) in the diagram shows a state of the memory after the picture P25 is decoded and five pictures of the picture P21 through the picture P25 are stored. (B) in the diagram shows a state of the memory after the invalid picture is inserted when the first picture in the Clip2, I60, is decoded. The picture I60 has an FN indicating “12” while the picture P25, which is decoded immediately before the picture I60, has an FN indicating “5”. It is therefore determined that six pictures are lost and six invalid pictures are inserted. In this case, all the pictures stored in the memory are removed, causing a problem that, for instance, in the state as shown in (A) of the diagram, the pictures P23, P24 and P25 are removed although they are not displayed yet.
FIG. 16 is a conceptual diagram for explaining a problem caused by the degree of allowance in the coding defined by the JVT at the time of editing or performing access random processing. (B) is in the diagram is an original stream that is same as the stream shown in FIG. 14. (A) in the diagram shows how the GOP2 is decoded without the GOP1. In this case, the pictures 25 through 27 cannot be replayed after replaying the pictures 22 through 24 since the pictures 25 through 27 have not been obtained, which causes a non-sequentiality in replay. This problem occurs when the GOP1 is removed as a result of editing or when performing random access starting from the GOP2, or in other cases. (C) in the diagram shows how the GOP1 is decoded without the GOP2. In this case, the non-sequentiality in replay is generated because the pictures 22 through 24 have not been obtained. This problem occurs when the GOP2 is removed as a result of editing.