(1) Field of the Invention
The present invention relates to a device and a method for controlling coding of a picture for a teleconference, a visual telephone, and the like.
(2) Description of the Related Art
Conventionally, CCITT (Comite Consultatif International Telegraph et Telephone) advice H.261 is employed as an image coding (compressing) method for communicating moving pictures at a low rate of 64 Kbps. FIG. 1 is a block diagram depicting the configuration of a coding device which implements the coding method H.261. The coding device in FIG. 1 comprises a preparatory processing unit 1, a coding unit 2, a smoothing buffer 3, and a transmission controller 5.
The preparatory processing unit 1 comprises an A/D converter 11, an NTSC/CIF converter 12, and a preparatory processing filter 13. The A/D converter 11 separates Y and C signals from NTSC signal, and A/D converts the Y and C signals. The NTSC/CIF converter 12 converts the NTSC signal into CIF (Common Intermediate Format) which represents an intermediate format. The preparatory processing filter 13 eliminates noise. The intermediate format CIF is a common format which overcomes a regional difference in television system so that every cordic can communicate with each other. The coding unit 2 comprises a coding sub unit 22, a coding controller 210 for controlling the coding sub unit 22. The coding sub unit 22 comprises a movement compensation frame predicting unit 221, a DCT (Discrete Cosine Transform) 222, a quantizing unit 223, a first variable length coding sub unit 224, a second variable-length coding sub unit 225, and a multiplexer 226.
The movement compensation frame predicting unit 221 compensates movement in a range of 16.times.16 pixels, and calculates a prediction error between a frame which was coded the last time and a frame to be coded next. The DCT 222 DCT converts the prediction error signal so that spatial coordinate data is converted into frequency coordinate data. Note that a block unit of the prediction error signal to be converted by the DCT 222 includes 8.times.8 pixels. The quantizing unit 223 quantizes a conversion coefficient from the DCT 222. The first variable length coding sub unit 224 codes the quantized conversion coefficient into Huffman code. The second variable length coding sub unit 225 converts a movement vector which was employed in movement compensation into Huffman code. The multiplexer 226 multiplexes main information which is the coded result from the first variable length coding sub unit 224 and sub information which is the coded result from the second variable length coding sub unit 225 to generate a transmission frame. If a sufficient amount of memory capacity remains within the smoothing buffer 3, the multiplexer 226 outputs the transmission frame into the smoothing buffer 3. The smoothing buffer 3 is a FIFO (First In First Out) memory, and the transmission frames which have been inputted to the smoothing buffer 3 are outputted to a transmission path 4 in input order. The transmission controller 5 communicates with the reception side, and performs an Automatic-Repeat-Request (ARQ) to re-transmit information to the reception side in responsive to state of confirmation signal.
The thus constructed coding device needs to transmit picture signal relating to moving pictures which in itself takes an enormous amount of information at a limited transmission speed which is several 10 Kbps. For this reason, picture signal of every inputted frame cannot be transmitted directly; instead, picture signal has to be compressed to be around several 10 Kbps. In compressing, the amount of code data is reduced by thinning out some information; therefore, some coding distortions in a display picture relating to space and time domains are unavoidable at the reception side.
The amount of information relating to moving picture signal changes along with time even though it must be transmitted by a transmission path with a preset transmission speed. Therefore, moving pictures must be coded according to a transmission speed of a transmission path.
To obtain a visually sufficient display picture at the reception side, besides that a high coding efficiency is required, coding distortions must be assigned to time and space domains appropriately according to visual characteristics so that either of these two distortions is not significant to the human eyes.
Since spatial distortion and time distortion in a display picture are related to each other, a coding control method for controlling a coding parameter which relates these two distortions to each other has been proposed. More specifically, a coding control method for maintaining an optimal balance of movement reproductivity, spatial resolution, and noise within the amount of information which is limited by a transmission speed by controlling the coding parameter is desirable.
Spatial resolution, such as clarity or fineness of a display picture is determined by a quantization accuracy from the quantizing unit 223. When an input picture is coded with a higher quantization accuracy, a larger code amount R(t) is obtained. "t" in R (t) indicates a time which progresses by one frame period. It is assumed that a k-th picture frame, in which k is a natural number, is inputted at time t. One frame period starts when a picture frame is inputted and continues until the next picture frame is inputted. The code amount R(t) is generated when the k-th picture frame is inputted, and coded information is stored in the smoothing buffer 3. L bit codes are outputted from the smoothing buffer 3 to the transmission path 4 per one frame period. Therefore, when a B(t) code amount remains in the smoothing buffer 3 at the time t, B(t+1)=B(t)+R(t)-L if the k-th picture frame is coded one frame period later than the time t. On the other hand, B(t+1)=B(t)-L if the k-th frame is dropped. Note that a (k+1)th frame is coded if B (t).ltoreq.L; a (k+1)th frame is dropped if B(t)&gt;L. Also, {R(t)/L} times one frame period is required to transmit an R(t) code amount of coded information, so that a coding rate S(t) is S(t)=L/R(t) when the k-th picture frame is inputted (hereunder simply referred to a coding rate).
Picture quality of a decoded picture varies depending on a quantization accuracy which was employed to code an input picture. If picture quality of a decoded picture is evaluated by an S/N ratio, the S/N ratio improves when a quantization accuracy is raised; however, a coding rate is decreased thereby. Accordingly, picture frames are dropped at a higher rate, and movement reproductivity is lowered. To improve movement reproductivity with an improved coding rate, a quantization accuracy must be reduced to lower an S/N ratio. However, spatial resolution of a display picture is deteriorated. A picture quality tradeoff function Ss=G (Ds) represents these mutual relations.
FIG. 2 shows an example of picture quality tradeoff function Ss. In FIG. 2, the horizontal axis and the vertical axis represent an S/N ratio and a coding rate respectively. A curve 101 represents a picture quality tradeoff function Ss. The picture quality tradeoff function Ss moves in the lower right hand direction when an input frame includes a big movement or a fine pattern, and it moves in the upper left hand direction with few movement. A point 103 represents a pair of a coding rate and a quantization accuracy which achieve an optimal balance of movement reproductivity and spatial resolution of a display picture visually. Such point 103 exists on a picture quality trade function for each input picture frame. A curve 102 in FIG. 2 obtained by linking the points 103 for input picture frames is an objection function So=O(Do).
To obtain a picture quality tradeoff function precisely, a picture frame must be coded repeatedly as varying a quantization accuracy, and a coding rate must be measured as for each quantization accuracy. However, this processing is time consuming, and it is not practical because of a limited processing time. "A coding Control Algorithm for Motion Picture Coding Accomplishing Optimal Assignment of Coding Distortion to Time and Space Domains", Electronic Information Communication Institute Report B, Vol. J71-B, No. B, pp945-954, August 1988 proposes two methods for determining a picture quality tradeoff function as referring to a frame which is currently coded.
A first method is explained. A prediction error signal DTC coefficient and a movement vector are obtained when movement compensation frame prediction and DCT are applied to a current input frame. A code amount R(q) and an S/N ratio D(q) are obtained based on a DCT coefficient histogram and a movement vector code amount for one frame as a function of a quantization accuracy q. Accordingly, a picture quality tradeoff function Ss=G(Ds) where a coding rate S(q)=L/R(q) is related with an S/N ratio D(q) can be obtained accurately.
FIG. 3 shows an example of picture quality tradeoff function cited in the reference. As shown in FIG. 3, picture quality tradeoff functions Ss are represented by different curve lines according to different input frames.
A second method is explained. In the second method, several candidates for a coding rate S(q) and an S/N ratio (q) are prepared, and the ones which match with the coded result of a frame which was coded the last time are selected. Then, a picture quality tradeoff function Ss for a frame to be coded next is estimated as referring to the selected result.
FIG. 4 explains the second method for estimating a coding rate Ss(q). Numerals 1-6 which are attached to curves in FIG. 4 represent coding rate-quantization accuracy characteristic candidate numbers; a point C represents a coding rate and a quantization accuracy for a last frame. Clearly, the point C is located most closely to the curve 3, so that a coding rate Ss(q) and an SN ratio Ds(q) represented by the curve 3 are estimated as those to be employed at a time k. Similarly, an SN ratio Ds(q) is estimated.
As another reference, it is attempted to transmit moving picture data which is coded at a high efficiency via a digital radio line such as a cordless telephone or a portable telephone. "CORDLESS VIDEO" First International Workshop on Mobile Multimedia Communications B. 3.1-1 Dec. 7-10, 1993 reports an experiment which is designed to transmit moving picture data coded by Digital European Cordless Telecommunications (DECT) which is a digital cordless telephone in Europe according to ITC advice H.261. According to this reference, a digital radio line frequently causes transmission errors which deteriorate moving picture data compressed with high efficiency. Therefore, transmission errors should be corrected. A transmission error rate may be as bad as around 1% with a digital cordless communication line, and burst errors which extend to as long as several 10 msec may be caused by slow phasing if a cordless transmission device for communicating via a digital cordless communication line is employed at a walking speed or slower. It is pointed out in the reference that burst errors cannot be overcome by a simple FEC (Forward Error Correction) which adds a check bit to transmission data so that the reception side can correct errors. Accordingly, the reference suggests a method for randomizing burst errors by combining the FEC and an interleave of a data sequence.
However, to randomize burst errors which extend to several 10 msec, bit sequence of several 100 msec data must be interleaved. In this case, a transmission delay of several 100 msec is unavoidable. Such transmission delay interferes a mutual communication in real time with a visual telephone or the like. Thus, the use of this method is severely limited. Accordingly, the reference concludes that the Automatic-Repeat-Request (ARQ) is most desirable. According to the ARQ, an error in transmission information is detected by the reception side; the detected result is transmitted to the transmission side as confirmation signal; then the transmission information is re-transmitted from the transmission side to the reception side according to state of the confirmation signal.
As for the first method for obtaining a picture quality tradeoff function set forth above, a DCT coefficient histogram for a current input frame is obtained, and a code amount which represents a block type corresponding to each quantization accuracy and a code amount which represents a quantization index of a DCT coefficient are calculated; therefore, enormous calculations are needed, which will be problematic in terms of efficiency, price, storage space, and demand of electric power. Because of the reason stated above, at least one frame period of processing delay occurs, and this is disadvantageous to a visual telephone and a video conference which require real time transmission. The second method for obtaining a picture quality tradeoff function can reduce processing amount; however, information about candidates for a picture quality tradeoff function must be stored. This is a severe drawback to a coding control method relating to coding of moving pictures.
When a generated code amount exceeds a remaining storage capacity of the smoothing buffer, an overflow occurs, so that coded information for a frame cannot be transmitted. As a result, picture quality of a decoded picture is degraded by large. To improve picture quality, frame droppings are repeated until a remaining code amount B(t) of the smoothing buffer becomes a code amount L which can be transmitted in one frame period or less. In actual coding, an error between frames cannot be predicted immediately after a scene changes, so that a compression rate does not improve, and an enormous code amount remains; however, an error between frames can be predicted from the next frame, so that a compression rate improves, and a code amount is reduced largely. With the conventional frame dropping method; however, a large code amount is generated for an initial frame, and frames are dropped until the amount of codes within the smoothing buffer is reduced. Accordingly, a relation between a picture frame coded the last time and a picture frame to be coded next is weakened, so that a compression rate does not improve any longer even if an error between frames is predicted. Consequently, an enormous code amount is generated. Under such circumstances, a coding rate remains ineffective.
With the ARQ between the transmission side and the reception side according to the second method, data which was already transmitted is transmitted again if a transmission error occurs; therefore, a transmission speed is reduced as re-transmissions are increased. That is, a transmission speed varies depending on transmission quality of a transmission path. According to a currently proposed coding control algorithm which accomplishes optimal assignment of coding distortion to time and space domains, a coding rate is calculated as assuming that an Lbit code amount is transmitted to the transmission path in one frame period; therefore, a change in a transmission speed does not influence the coding control algorithm. However, compared to the case in which a transmission speed is not decreased at all, the smoothing buffer will be reduced more slowly when a transmission speed is decreased, so that the smoothing buffer generates an overflow which always appears as coding distortion to time domain (frame dropping). If coding distortion in a display picture caused by a decreased transmission speed can be divided into lowering of movement reproductivity and lowering of spatial resolution appropriately, a coding control method which is suitable for transmission of moving pictures by a radio line can be obtained even when a transmission speed of the radio line varies depending on transmission quality of a transmission path.