The conventional coding method and decoding method of digital moving image signal includes the ITU-T Recommendation H. 261 recommended in March 1993, MPEG by ISO, and other international standards, and the ITU-T Recommendation H. 261 is mainly described in this specification. The H. 261 is a standard of coding method consisting of the following three technical elements.
First is the motion compensation predicting system, in which one picture (image) of input image signal (hereinafter called input picture), and an already coded picture (image) (hereinafter called coded picture) are compared, the motion amount between the two pictures is measured (motion detection), and the input picture is predicted from the motion amount and coded picture. The difference between the predicted image and input picture (prediction error signal) is calculated, and the prediction error signal and motion amount are transmitted to the receiving side, so that image compression coding is realized by transmitting image information in a small data quantity.
Second is the discrete cosine transform (DCT) system, in which the prediction error signal is transformed into a frequency region. This prediction error signal, when transformed into a frequency region, is characterized by concentrating power in a specific frequency region (low frequency region). By making use of this feature, it is combined with the third system explained below, and the image information can be transmitted in a further small data quantity.
Third is the variable length coding (VLC) system, in which, by making use of the nature of concentration of power in a specific frequency region, the coefficient of frequency at high rate of appearance is expressed by a short coding length, while the coefficient of frequency at low rate of appearance is expressed by a long coding length, so that the average coding length is shortened, and by employing this system, the image information can be transmitted in a small data quantity.
These three technical elements are not applied at once in the entire picture, but are applied in individual macro blocks formed by dividing the picture into blocks (macro blocks) of 16 16 pixels each.
An outline composition of output bit stream (bit row) coded by this ITU-T Recommendation H. 261 is described below by referring to an explanatory diagram in FIG. 4.
At the beginning of a bit stream expressing a picture, a specific bit stream showing start of a picture called picture start code is disposed. Next is disposed picture control information (frame control information) showing how to decode this picture. If this information is destroyed by transmission error, a different picture from the transmitted one is reproduced at the receiving side, and hence these pieces of information are very important. Consequently, control information is disposed in a group of blocks gathering plural macro blocks in one unit, and further information is disposed in each macro block. One macro block contains control information such as motion amount and DCT coefficient information. This macro block information is disposed by the number of macro blocks contained in the input picture.
The picture control information contains the time information of the picture called Temporal Reference (hereinafter called TR). In the TR, lower five bits of the picture number of input picture are used. Since the H. 261 is the image coding method for use in real time communication at low transmission speed, delay until the image data from the coding side reaches the decoding side must be minimum. For this purpose, a technique called decimation is applied to decrease the number of pictures to be coded. Hence, values of TR are not always continuous, but are intermittent.
The decimating technique is further described below. First, assume the following.
(1) Coding process of input picture and decoding process of received data are completed instantly (because the time is regarded very short as compared with the transmission time). PA1 (2) The processing system of input picture is of NTSC, and hence a picture is present in every 1/30 second. Picture numbers are increased sequentially from the first picture in every one of picture in 1/30 second each. PA1 (3) The coded output bit stream is immediately stored in the transmission buffer, and is extracted from the transmission buffer at a bit rate conforming to the transmission speed. PA1 (4) In decimating, when the buffer residue in the transmission buffer becomes smaller than a certain threshold, a next picture is incorporated and coded.
For the ease of understanding them, a case without decimating (FIG. 5) and a case with decimating (FIG. 6) are comparatively described below.
In FIG. 5, the axis of abscissas denotes the time and picture number, and the axis of ordinates represents the residue of output bit stream in the transmission buffer. First, at time zero (picture number 0), the coded bit stream is accumulated in the transmission buffer ([1] in diagram). Then, depending on the transmission speed, the bits are transmitted from the transmission buffer. Accordingly, the residue of the transmission buffer decreases as indicated by broken line. The time when all data of picture number 0 reaches the decoding side is the position of [2] in the diagram at the intersection of the broken line and axis of abscissas. Hence, the delay time is t0.
Consequently, 1/30 second later, a picture of picture number 1 is entered, and its output bits are superposed on the transmission buffer, and, same as above, the time when picture 1 reaches the decoding side is the intersecting position of the broken line and axis of abscissas, linking [3] and [4] in the diagram. Hence, the delay time is t1. As clear from the diagram, at time 1/30 second, bits of picture 0 are remaining, and by coding picture 1, until all bit streams of picture 0 are transmitted completely, bit streams of picture 1 stay in the transmission buffer, and the delay time is extended. Similarly, as pictures 2, 3, . . . , n are continued, the delay time tn continues to increase until real time communication is disabled.
Next is explained the processing by decimating. In FIG. 6, the axis of ordinates and axis of abscissas are same as defined in FIG. 5. Similarly, all bit streams of picture 0 are completely transmitted at time [2] in the diagram. The delay time can be shortened by incorporating the next coded picture when the residue of the transmission buffer is minimum, and therefore picture 2 is coded at time (2/30 second) when the buffer residue is smaller than the threshold. Bit streams when coding picture 2 are completely transmitted at time [3], and the delay time is t2. Therefore, the delay time is shorter than the above case without decimating.
Thus, by employing the decimating technique, the delay time can be shortened. It is also understood easily that the values of TR are not continuous but intermittent.
The importance of TR is described below.
In the H. 261, in real time communication at low transmission speed, since the coded picture is discontinuous as a result of decimating, the TR is required at the decoding side in order to display the reproduced image in correct time.
Incidentally, the ITU-T Recommendation H. 263 recommended in 1995 is a further efficient coding method on the basis of the ITU-T Recommendation H. 261. In this method, a method called PB frame is employed. In this method, as shown in FIG. 7, from coded picture N-2 by coding N-2-th picture, picture N of the N-th picture to be coded is predicted and coded, and from coded picture N-2 and reproduced coded picture N, coded picture N-1 is predicted. The moving amount (MV) between coded pictures N-2 and N is proportionally distributed by the time (t) from coded picture N-2 to N, and the time (tb) from coded picture N-1 to N, and hence coded picture N-1 is predicted. In this case, the time information is necessary, and hence the TR is needed. Thus, the TR is very important picture control information.
However, if this very important time information of TR is broken by transmission error, wrong time may be displayed at the decoding side, or in the PB frame method, intermediate picture cannot be decoded correctly.