1. Field of the Invention
The present invention relates to a coding device and a coding method for coding a video signal by using a video packet having a length limit set thereto which is related to a portable telephone, a TV telephone system and the like, for example.
2. Description of the Background Art
FIG. 6 is a block diagram showing a conventional coding device described in “Everything about MPEG-4” (Institute of Industrial Research) P. 39 to P. 40, for example, FIG. 7 is a diagram illustrating an input signal of the conventional coding device, FIGS. 8A to 8D are diagrams illustrating a structure of a bit stream, and FIG. 9 is a diagram illustrating a position (arrangement) of a video packet over a screen (display state).
In FIG. 6, the reference numeral 1 denotes a subtracter for receiving an external input signal (a luminance signal, a color difference signal or the like) sent externally as a first input. An output of the subtracter 1 is input to a DC/AC predictor 4 for predicting a quantized value of each component of a direct current (DC) and an alternating current (AC) and a reverse quantizer 6 through DCT (Discrete Cosine Transform) means 2 and a quantizer 3. Moreover, an output of the DC/AC predictor 4 is sent to a first input of variable—length coding means 5, and the variable—length coding means 5 outputs a bit stream.
On the other hand, an output of the reverse quantizer 6 to which an output of the quantizer 3 is input is sent to a first input of an adder 8 through reverse DCT means 7. An output of the adder 8 is sent to a memory 9, and an output of the memory 9 is sent to a first input of predicted image forming means 10 and a first input of motion detecting means 11.
An external input signal is sent to a second input of the motion detecting means 11, and an output of the motion detecting means 11 is sent to a second input of the predicted image forming means 10 and a motion vector predictor 12.
An output of the motion vector predictor 12 is sent to a second input of the variable—length coding means 5. Moreover, an output of the predicted image forming means 10 is sent to a second input of the subtracter 1 and a second input of the adder 8.
Next, an operation will be described. First of all, a video signal is divided into macroblocks to be basic processing units as shown in FIG. 7 and is input as an external input signal (the external input signal is basically input as a macroblock, and means for generating a macroblock may be provided in a former stage such that a conversion into a macroblock is carried out even if the macroblock is directly input).
More specifically, in the case in which a video signal to be input is 4:2:0 (which indicates that the number of pixels of luminance information Y is a double in horizontal and vertical directions for the number of pixels of color difference information Cb and Cr), a size of 16 pixels×16 lines of the luminance signal (Y) becomes equal to that of 8 pixels×8 lines of two color difference signals (Cb, Cr) over a screen.
Accordingly, six blocks of 8 pixels×8 lines (including four blocks for the luminance signal and two blocks for the color difference signal) constitute one macroblock.
It is premised that a Video Object Plane (VOP which is a unit image) to be input as an external input has a rectangular shape and is identical to a frame.
Each block is subjected to the discrete cosine transform (DCT) and is quantized in the quantizer 3. After a coefficient of each component of the DC and the AC is predicted in the DC/AC predictor 4, a DCT coefficient thus quantized is variable—length coded together with additional information such as a quantization parameter.
The foregoing implies intracoding (which is also referred to as in—frame coding). A VOP applying the intracoding to all the macroblocks is referred to as an I—VOP (Intra—VOP).
On the other hand, the quantized DCT coefficient is reversely quantized in the reverse quantizer 6 and is decoded by the reverse DCT in the reverse DCT means 7, and a decoded image is stored in the memory 9 through the adder 8. The decoded image stored in the memory 9 is used when interceding (which is also referred to as interframe coding) is to be carried out.
In the case of the interceding, a motion vector indicative of a motion of a macroblock which is input as an external input signal is detected in the motion detecting means 11. The motion vector indicates such a position that an error is minimized with respect to the input macroblock in the decoded images stored in the memory 9.
The predicted image forming means 10 forms a predicted image based on the motion vector detected by the motion detecting means 11.
Subsequently, a differential signal between the input macroblock and the predicted image formed by the predicted image forming means 10 is obtained, is subjected to the DCT in the DCT means 2 and is quantized in the quantizer 3.
A transformation coefficient thus quantized is variable—length coded (interceded) together with the motion vector thus predicted and coded and additional information such as a quantization parameter. Moreover, the quantized DCT coefficient is reversely quantized in the reverse quantizer 6 and is subjected to the reverse DCT in the reverse DCT means 7, and is then added to the predicted image by the adder 8 and is stored in the memory 9.
The intercoding includes one—way prediction in which prediction is carried out based on only a former VOP on a time basis in order of display of the image and bidirectional prediction in which prediction is carried out based on former and latter VOPs on a time basis. The VOP coded through the one—way prediction will be referred to as a P—VOP (Predictive VOP) and the VOP coded through the bi-directional prediction will be referred to as a B—VOP (Bidirectionally Predictive VOP).
Next, a structure of a bit stream output from the variable—length coding means 5 will be described with reference to FIGS. 8A to 8D. As shown in FIG. 8A, a bit stream of 1VOP is constituted by (a bit stream of) one video packet or more.
One video packet is formed by coded data of one macroblock or more. For a first video packet of the VOP, a VOP header is attached to a head and a stuff bit for a byte alignment is attached to an end (FIG. 8B).
In the case of second and succeeding video packets, Resync Marker for detecting a head of the video packet and a video packet header are attached to a head, and a stuff bit is attached to an end (FIG. 8C).
The stuff bit is added up to a termination (break) of the video packet in a unit of 1 to 8 bits in order to adjust the byte alignment to be attached to the end of the video packet and the meaning thereof is distinguished from that of a stuffing which will be described below.
As shown in FIG. 8D, moreover, an optional number of stuffings can also be put in the video packet. For example, in the case of MPEG4 Video, the stuffing is referred to as a stuffing macroblock and can be put in an optional video packet in the same manner as the macroblock. The stuffing is discarded (is not substantially utilized) on the decoder side.
The stuffing is used as a word having 9 bits or 10 bits for the stuffing irrespective of the byte alignment (for example, the termination of the video packet is adjusted) and is inserted between the macroblocks, of which meaning is distinguished from the meaning of the stuff bit.
An optional number of macroblocks can be put in one video packet. In the case in which error propagation is taken into consideration, it is generally preferable that a code volume of each video packet should be almost constant. In the case in which the code volume of the video packet is thus set to be almost constant, a rate (area) occupied by each video packet in the 1VOP is not constant as shown in FIG. 9.
In the conventional coding device described above, there has not been considered control of the code volume which is to be carried out when a length of the video packet is limited.
For example, in the case in which a reversible variable—length code is to be used in the variable—length coding means 5, the decoder decodes the variable—length code in a reverse direction from an end of the video packet even if an error is made in an operation for decoding the variable—length code in a forward direction from a head of the video packet. Thus, the variable—length code can be decoded.
In this case, it is necessary to retain one video packet in a receiving buffer on the decoder side. Therefore, a limit is sometimes set to a length of the video packet in order to define a size of the receiving buffer.
In such a case, a coding device should control a code volume such that the length of each video packet is set to be a predetermined length or less.
Moreover, the coding device should manage a volume of generated codes such that a transmitting buffer (not shown) which is provided in a latter stage of the variable—length coding means 5 does not cause an overflow and an underflow.
The quantization parameter to be used in the quantizer 3 is usually adjusted to increase or decrease the code volume. If the code volume is extremely small as in a static image, it is necessary to insert the stuffing, thereby increasing the code volume such that the transmitting buffer does not cause the underflow.
The stuffing does not have information which is substantially related to the decoding. Therefore, it is desirable that the stuffing should not be inserted if possible. For this reason, generally, a minimum stuffing is inserted if the code volume is small after the 1VOP is completely coded.
In the case in which the limit is set to the length of the video packet, the stuffing cannot perfectly enter one video packet when the stuffing is inserted after the 1VOP is completely coded.
For example, in the case of a static image formed by computer graphics, few codes are generated if the coding is carried out with the P—VOP. On the other hand, in such a structure that the static image is to be coded, a signal indicative of the underflow is output from the transmitting buffer and an operation is carried out to insert the stuffing based on the signal.
When the stuffing is inserted into a last video packet of the VOP according to the operation, it is sometimes generated (inserted) beyond the limit of the length of the video packet. On condition that a limit is set to a capacity per video packet and the video packet having only the stuffing is prohibited, there has conventionally been a problem in that the length limit of the video packet cannot be maintained or the video packet having only the stuffing is generated.