1. Field of the Invention
This invention relates to an encoding system and an encoding method for a mobile telephone or a TV telephone system, for example, to encode video signals in real time.
2. Description of the Related Art
FIG. 1 is a block diagram of an encoding system of the background art, as disclosed on pp. 39 to pp. 40 of “All of MPEG-4” (Association of Industrial Search), for example; FIG. 2 is an explanatory view showing an input signal of this encoding system of the background art; FIGS. 3a to 3d are explanatory diagrams showing constructions of bitstreams; and FIG. 4 is an explanatory diagram showing positions (arrangements) on a screen (in a displayed state) of a video packet.
In FIG. 1, reference numeral 1 designates a subtracter for receiving an external input signal (e.g., a luminance signal and two chrominance signals in the shown example) as its first input. The output of the subtracter 1 is inputted through DCT (Discrete Cosine Transformer) 2 and quantizer 3 to a DC/AC predictor 4 for predicting the quantized values of a DC component and an AC component, and dequantizer 6. The output of the DC/AC predictor 4 is fed to the first input of variable length coder 5, which outputs a bitstream.
On the other hand, the output of the dequantizer 6, to which the output of the quantizer 3 is inputted, is fed through IDCT 7 (IDCT: Inverse DCT)to the first input of an adder B. The output of this adder 8 is fed to a memory 9, the output of which is fed to the first input of predicted image former 10 and the first output of a motion detector 11.
An external input signal is fed to the second input of the motion detector 11, the output of which is fed to the second input of the predicted image former 10 and a motion vector predictor 12.
The output of the motion vector predictor 12 is fed to the second input of variable length coder 5. On the other hand, the output of the predicted image former 10 is fed to the second input of the subtracter 1 and the second input of the adder 8.
Here will be described the operations. First of all, the video signals are divided into macroblocks or basic processing units, as shown in FIG. 2, and are inputted as external input signals (where the external input signals are basically inputted as the macroblocks, which may be directly inputted or may be converted thereinto by preprocessing unit for generating the macroblocks).
Where the video signals inputted are 4:2:0, 16 pixels×16 lines of a luminance signal (Y) are as large in the screen as 8 pixels×8 lines of two chrominance signals (Cb, Cr). Therefore, six blocks (i.e., four blocks for the luminance signal and two blocks for the chrominance signals) of 8 pixels×8 lines construct one macroblock.
Here, it is premised that the video object plane (VOP: a unit image) to be inputted as an external input has a rectangular shape identical to the frame.
Each block is quantized in the quantizer 3 after subjected to the discrete cosine transform (DCT). The DCT coefficients thus quantized are transformed together with additional informations such as a quantizing parameter into the variable length codes after the coefficients of the individual DC and AC components were predicted in the DC/AC predictor 4.
This is the intra coding (as also called the “in-frame encoding”). The VOP for coding all the macroblocks as intra coding will be called the “I-VOP (Intra-VOP)”.
On the other hand, the quantized DCT coefficients are dequantized in the dequantizer 6 and are decoded by the IDCT 7 so that the decoded image is stored in the memory 9. The decoded signal stored in this memory 9 is used at an inter coding (which may be called the “inter-frame encoding”).
In the inter coding case, the motion detector 11 detects the motion vectors indicating the motions of the macroblocks which are inputted as the external input signals. This motion vector indicates such a position in the decoded image stored in the memory 9 as takes the minimum difference from the macroblock inputted.
The predicted image former 10 forms a predicted image on the basis of the motion vector which is detected in the motion detector 11.
Subsequently, a differential signal is determined between the macroblock inputted and the predicted image formed in the predicted image former 10 and is subjected to the DCT in the DCT 2 so that it is quantized in the quantizer 3.
The DCT coefficients quantized are converted together with the additional information such as the predicted motion vector and the quantized parameter into the variable length codes. On the other hand, the quantized DCT coefficients are dequantized in the dequantizer 6 and subjected to the IDCT in the IDCT 7. The output of the IDCT 7 is added to the predicted image by the adder 8 so that the sum is stored in the memory 9.
For the inter coding, there are two types of prediction. One type is a forward prediction, which is made in the display order of the images only from the VOP preceding in time, and the other type is a bidirectional prediction, which is made from both the preceding VOP and the succeeding VOP. The VOP to be encoded by the forward prediction is called the “P-VOP (Predictive VOP), and the VOP to be encoded by the bidirectional prediction is called the “B-VOP (Bidirectionally Predictive VOP)”.
With reference to FIG. 3, here will be described the construction of the bitstream to be outputted from the variable length coder 5. The bitstream of 1 VOP is constructed of one or more video packets, as shown in FIG. 3a. 
Here, one video packet is composed of encoding data of one or more macroblocks, and the first video packet of the VOP is assigned the VOP header to its head and stuffing bits for byte alignment to its tail (as shown in FIG. 3b).
The second and subsequent video packets are assigned a Resync Marker for detecting the leading video packet and the video packet header to its head and the stuffing bits to its tail (as shown in FIG. 3c).
Here, the stuffing bits are added at the unit of 1 to 8 bits to the terminal end (cut) of the video packet for adjusting the byte alignment, and is discriminated in its meaning from the stuffing data, as will be described in the following.
On the other hand, the stuffing data can be introduced in an arbitrary number into the video packet, as shown in FIG. 3d. In the case of MPEG4 Video, for example, the stuffing data is called the “stuffing macroblock”, which can be introduced like the macroblock into an arbitrary video packet. This stuffing data is abandoned (not substantially used) on the side of the decoding system.
The stuffing data, as defined herein, is used as words of 9 bits or 10 bits for increasing the number of bits but independently of the byte alignment (for adjusting the terminal end of the video packet, for example) and is used between the macroblocks so that its meaning is discriminated from the aforementioned stuffing bits.
The number of macroblocks to be inserted into one video packet is arbitrary but may be generally so constructed, if an error propagation is considered, that each video packet may have a substantially constant number of bits. Where the number of bits in the video packet is thus substantially constant, the area to be occupied by each video packet in one VOP is not constant, as shown in FIG. 4.
With reference to FIG. 5, here will be detailed the operations of the DC/AC predictor 4 (i.e., on the luminance signal Y-component of the macroblock).
As described above, the DC/AC predictor 4 predicts the coefficients of the DC component and the AC components of the quantized DCT coefficients which are outputted from the quantizer 3 in the intra coding case. In the inter coding case, the DC component and the AC components are not predicted, but the quantized DCT coefficients, as outputted from the quantizer 3, are outputted as they are to the variable length coder 5. In this case, the luminance signal Y and the chrominance signals Cb, Cr are separately subjected to the DC/AC prediction.
Here will be described the predictions of the DC component and the AC components of the intra coding case.
If the quantized DCT coefficients of the block being coded are designated by Fx (i, j) (0≦i≦7 and 0≦j≦7), if the quantized DCT coefficients of the lefthand adjacent block are designated by Fa (i, j) (0≦i≦7 and 0≦j≦7), if the quantized DCT coefficients of the upper adjacent block are designated by Fc (i, j) (0≦i≦7 and 0≦j≦7), and if the quantized DCT coefficients of the lefthand upper block are designated by Fb (i, j) (0≦i≦7 and 0≦j≦7), the prediction direction is determined at first from the quantized DC component Fb (0, 0) of the lefthand upper block, the quantized DC component Fa (0, 0) of the lefthand adjacent block and the DC component Fc (0, 0) of the upper adjacent block.
If the quantization step size of the DC component of the lefthand adjacent block is designated by Qda, if the quantization step size of the DC component of the lefthand upper block is designated by Qdb and if the quantization step size of the DC component of the upper adjacent block is designated by Qdc, for example, the dequantized DC components fa (0, 0), fb (0, 0) and fc (0, 0) are determined by the following relations:fa(0, 0)=Fa(0, 0)×Qda; fb(0, 0)=Fb(0, 0)×Qdb; andfc(0, 0)=Fc(0, 0)×Qdc. If the following relation holds, it is conceived that the correlations are intense in the vertical direction, so that the predictions are made from the dequantized DC components fc (0, 0) of the upper adjacent block:|fa(0,0)−fb(0, 0)|<|fb(0, 0)−fc(0, 0)|.If the aforementioned relation does not hold, it is conceived that the correlations are intense in the horizontal direction, so that the predictions are made from the dequantized DC components fa (0, 0) of the lefthand adjacent block.
In the case of predicting the DC components from the upper adjacent block, the predicted DC component Px (0, 0) is determined by setting the following relation:Px(0, 0)=Fx(0, 0)−fc(0, 0)/Qdx. In the case of predicting the DC components from the lefthand adjacent block, the predicted DC component Px (0, 0) is determined by setting the following relation:Px(0, 0)=Fx(0, 0)−fa(0, 0)/Qdx. Here, Qdx is the quantization step size of the DC component of the current block, and the aforementioned divisions are calculated by the rounding method, for example.
Subsequently, the AC components are predicted by using the prediction direction of the DC components. In the following, Qpa denotes the quantization parameter of the lefthand adjacent block, Qpc denotes the quantization parameter of the upper adjacent block and Qpx denotes the quantization parameter of the current block. If the DC component is predicted from the upper adjacent block, the first row of the quantized AC components are predicted as follows:Px(i, 0)=Fx(i, 0)−(Fc(i, 0)×Qpc)/Qpx(i=1 to 7).
On the other hand, if the DC component is predicted from the lefthand adjacent block, the first column of the quantized AC components are predicted as follows:Px(0, j)=Fx(0, j)−(Fa(0, j)×Qpa)/Qpx(j=1 to 7).
Thus, the predicted AC components Px (i, 0) or Px (0, j) are determined. The aforementioned divisions are calculated by the rounding method, for example.
After the aforementioned predictions of the AC components were independently made for the six blocks composing one macroblock, it is determined as follows on a macroblock basis whether or not the AC components are to be predicted.
Here, an AC prediction decision index SB of the block is determined in the following manner as the index for deciding whether the original video signals are left as they are (without the prediction of the AC component) or predicted. Where the prediction is made from the upper adjacent block, the AC prediction decision index SB is determined from the following formula:SB=Σ|Fx(i, 0)|−Σ|Px(i, 0)| (i=1 to 7),  [Formula 1]
Where the prediction is made from the lefthand adjacent block, the AC prediction decision index SB is determined from the following Formula:SB=Σ|Fx(0, j)|−Σ|Px(0, j)| (j=1 to 7),  [Formula 2]After calculating the AC prediction decision indexes for all blocks in the current macroblock, the sum of these indexes is calculated, that is SBS=ΣSB. If the sum SBS is in the following relation, the AC components are predicted, but otherwise not predicted:SBS≧0.
Here in the case of predicting the AC components, ac_pred_flag=1, but otherwise, ac_pred_flag=0. With this additional information ac_pred_flag, each macroblock is encoded by the variable length coder 5.
In the case of the macroblock of ac_pred_flag=1, for each block predicted from the upper adjacent block, the value Ox (i, j) is determined from the following relations:
                              Ox          ⁢                                          ⁢                      (                          i              ,              j                        )                          =                  {                                                                      Px                  ⁡                                      (                                          i                      ,                      0                                        )                                                                                                (                                                            i                      =                                              0                        ⁢                                                                                                  ⁢                        to                        ⁢                                                                                                  ⁢                        7                                                              ,                                          j                      =                      0                                                        )                                                                                                      Fx                  ⁡                                      (                                          i                      ,                      j                                        )                                                                                                                    (                                                                  i                        =                                                  0                          ⁢                                                                                                          ⁢                          to                          ⁢                                                                                                          ⁢                          7                                                                    ,                                              j                        =                                                  1                          ⁢                                                                                                          ⁢                          to                          ⁢                                                                                                          ⁢                          7                                                                                      )                                    .                                                                                        [                  Formula          ⁢                                          ⁢          3                ]            For each block predicted from the lefthand adjacent block, the value Ox (i, j) is determined from the following relations:
                              Ox          ⁡                      (                          i              ,              j                        )                          =                  {                                                                      Px                  ⁡                                      (                                          0                      ,                      j                                        )                                                                                                (                                                            i                      =                      0                                        ,                                          j                      =                                              0                        ⁢                                                                                                  ⁢                        to                        ⁢                                                                                                  ⁢                        7                                                                              )                                                                                                      Fx                  ⁡                                      (                                          i                      ,                      j                                        )                                                                                                                    (                                                                  i                        =                                                  1                          ⁢                                                                                                          ⁢                          to                          ⁢                                                                                                          ⁢                          7                                                                    ,                                              j                        =                                                  0                          ⁢                                                                                                          ⁢                          to                          ⁢                                                                                                          ⁢                          7                                                                                      )                                    .                                                                                        [                  Formula          ⁢                                          ⁢          4                ]            
For the block belonging to the macroblock of ac_pred_flag=0, the value Ox (i, j) is determined from the following relations:
                              Ox          ⁡                      (                          i              ,              j                        )                          =                  {                                                                      Px                  ⁡                                      (                                          0                      ,                      0                                        )                                                                                                (                                                            (                                              i                        ,                        j                                            )                                        =                                          (                                              0                        ,                        0                                            )                                                        )                                                                                                      Fx                  ⁡                                      (                                          i                      ,                      j                                        )                                                                                                (                                                            i                      =                                              0                        ⁢                                                                                                  ⁢                        to                        ⁢                                                                                                  ⁢                        7                                                              ,                                          j                      =                                              0                        ⁢                                                                                                  ⁢                        to                        ⁢                                                                                                  ⁢                        7                                                                                                                                                                                                                                                                                                                                  ⁢                                                                  (                                                  i                          ,                          j                                                )                                            ≠                                              (                                                  0                          ,                          0                                                )                                                              )                                    .                                                                                        [                  Formula          ⁢                                          ⁢          5                ]            This value Ox (i, j) is outputted as the output of the DC/AC predictor 4 to the variable length coder 5.
In the predictions thus far described, where the current block is on the left end of the VOP, there is neither the lefthand adjacent block nor the lefthand upper block for the current block. Therefore, a predetermined constant β is used as the values of the dequantized DC components fa (0, 0) and fb (0, 0) to be used in the aforementioned predictions. In this case, the AC components Fa (i, j) and Fb (i, j) ((i, j)≠(0, 0)) to be used in the aforementioned predictions are set to 0.
The constant β is an intermediate value of the range of the value of the DC component of the DCT coefficients to be outputted from the DCT 2. Specifically, β=1,024 where the DC component to be outputted from the DCT 2 is 11 bits and takes a value from 0 to 2,047.
Similarly, where the current block is on the upper end of the VOP, there is neither the upper adjacent block nor the lefthand upper block for the current block. Therefore, the aforementioned constant β is used as the values of the dequantized DC components fc (0, 0) and fb (0, 0) to be used in the aforementioned predictions, and the AC components Fc (i, j) and Fb (i, j) ((i, j)≠(0, 0)) are set to 0.
In the aforementioned predictions, moreover, where the block lefthand adjacent to the current block belongs to a video packet different from that of the current block, the dequantized DC component fa (0, 0) to be used in the aforementioned predictions is assumed to take a value of the aforementioned constant β, and the AC components Fa (i, j) ((i, j)≠(0, 0)) are 0.
In the aforementioned predictions, likewise, where the block upper adjacent to the current block belongs to a video packet different from that of the current block, the dequantized DC component fc (0, 0) to be used in the aforementioned predictions is assumed to take a value of the aforementioned constant β, and the AC components Fc (i, j) ((i, j)≠(0, 0)) are 0.
In the aforementioned predictions, on the other hand, where the block lefthand upper to the current block belongs to a video packet different from that of the current block, the dequantized DC component fb (0, 0) to be used in the aforementioned predictions is assumed to take a value of the aforementioned constant β, and the AC components Fb (i, j) ((i, j)≠(0, 0)) are 0.
Thus, the DC/AC predictor 4 is so constructed by not referring the DC component and the AC components between the blocks belonging to the different video packets that the propagation of a transmission error in the DC/AC predictions may be confined in the video packet.
The encoding system of the background art thus far described has not sufficiently considered the processing for avoiding the overflow of the transmission buffer or the underflow of the VBV buffer or a virtual buffer on the receiver side.
Moreover, the number of bits for a macroblock is usually increased/decreased by adjusting the quantization parameters to be used in the quantizer 3, but there is no explicit way for processing the case in which overflow of the transmission buffer occurs even with the maximum quantization parameter (while supressing the number of bits with most coarse quantization).
Processing time for encoding a VOP is also an issue. Where the rate of the VOP to be inputted is F (1/sec.), it is requested that all the macroblocks composing one VOP be encoded for a time period of 1/F (sec.) or shorter.
Where the motion detector 11 is so constructed as to change the search range of the motion vector adaptively in response to the motion of the object in the VOP, however, the time period necessary for the motion detector 11 to detect the motion vector of each macroblock changes for the individual macroblocks so that the time period for processing one VOP is not constant. Therefore, new control is necessary for encoding all the macroblocks in a VOP within a predetermined time period.