1. Field of the Invention
The present invention relates to a video coding apparatus and a video coding method for compression-coding a video signal in high efficiency and transmitting the same, and more particularly, it relates to refresh of video coding for eliminating influence caused by a transmission error in a reproduced picture.
2. Description of the Background Art
As the field of application of the technique of compression coding a video signal in high efficiency and transmitting the same, a visual telephone or a video conference shown in FIG. 34A is general. Further, application to a system shown in FIG. 34B for transmitting a video signal through digital radio communication utilizing a transmission path of wireless LAN for monitoring a danger point or transmitting a picture between mobiles, and application to picture distribution utilizing Internet shown in FIG. 34C are expected.
In the case of wireless monitoring, it is indispensable to compression-code the picture signal in high efficiency for efficiently utilizing the frequency band in view of effective use of wireless resources, and the communication quality represented by the error rate lowers by at least two or three digits as compared with a cable system. Therefore, the wireless monitoring is readily influenced by an error when performing high-efficiency compression coding omitting redundant information, and hence it is indispensable to improve error resistance by devising a refresh method or the like as described later. Dissimilarly to the visual telephone, transmission of picture information is generally unidirectional from a camera side to a monitor side in the case of monitoring, and the range of utilization rather widens when the refresh functions through a unidirectional transmission path.
A portable visual telephone employing digital radio communication such as PHS (Personal Handyphone System) is also assumed. In this case, a bidirectional transmission path can be ensured although the quality is inferior as compared with the cable system.
Also in the case of Internet, the Internet network presupposes the best effort, there is no guarantee on the quality of a packet waste rate or the like, and it is indispensable to consider effective utilization of resources and processing against packet loss. In the case of Internet, a bidirectional communication path is ensured and hence a method presupposing bidirectionality can be utilized in 1:1 transmission system. However, in the case of a broadcasting type 1:multiple transmission system, there is such a problem that processing of feedback information on the server side concentrates and hence the range of utilization rather widens when the refresh unidirectionally functions. Further, such a case that the server side is storage information is also assumed in the case of Internet, and hence feedback information cannot be utilized since processing of coding has been ended before the communication time.
In general, there has been devised a method of transmitting coded data with an error detection code on the coding side when a feedback path is included in the communication path and, if an error is detected when the decoding side performs error detection of the received data, noticing the coding side this error through the feedback path so that the coding side INTRA-codes and refreshes all coding after the notice of the error. ITU-T recommendation H.261 standardizes this error notice as a screen update request.
With reference to video coding in a video conference or a visual telephone, the conventional video coding method is now described in detail.
Generally in coding of a video conference or visual telephone signal, it is general to employ coding which combines inter-picture coding utilizing frame-to-frame correlation and intra-picture coding with each other along the frame direction. A television image formed by 30 pictures (frames) per second has large correlation along the time axis direction, and if employing pixels on the same position of a screen precedent by one frame for prediction through Inter-frame correlation, it follows that most ideal prediction can be performed when the screen is still. In INTER coding, however, Inter-frame correlation contrarily lowers if there is motion in the screen, resulting in being rather lower even as compared with correlation between adjacent pixels in a field. On the other hand, each pixel of a picture signal per frame has small level change with respect to an adjacent pixel and its correlation is strong. It is assumed that its self correlation function can be approximated by a negative exponential function. At this time, Power spectral density which is Fourier transform of the self correlation function has a property of being maximized at a zero frequency component (i.e., dc component) and monotonously decreasing as the frequency component increases. While Fourier transform is best known as orthogonal transform to a frequency region, the Fourier transform includes complex number calculation and its structure is complicated, and hence it is general to employ two-dimensional DCT (Discrete Cosine Transform) in coding of pictures as substitute orthogonal transform. After a transform coefficient decomposed into frequency components by DCT is quantized to a level zero which is an uncoded transform coefficient (zero value of the coded coefficient) and to a level .+-.K from a level .+-.1 which are non-zero values of the coded coefficient taking discrete quantization representative values, run-length coding for coding the number of successive zeros preceding the coded coefficient and Huffman coding for allocating variable length codes in response to the originating rate of the level of the non-zero value of the coded coefficient are performed, whereby video data are compressed.
For example, ITU-T recommendation H.261 applies motion compensation inter-picture coding to a picture having small motion while performing coding shown below on a prediction error between frames. Further, no inter-picture coding is applied to a picture having large motion but the following coding is directly performed on frame pixels. FIG. 35 shows an encoder and a decoder for video data according to H.261.
As shown in FIG. 35, an encoder 116 for video data according to H.261 comprises a subtraction part 107, a first orthogonal transform part 108 performing two-dimensional cosine transform, a first quantization part 109, a second inverse quantization part 110, a second inverse orthogonal transform part 111, an addition part 112, a second picture memory 113 for motion compensation, an in-loop filter 114, a coding control part 115 and selectors 123 and 124.
On the other hand, a decoder 122 comprises a first inverse quantization part 117, a first inverse orthogonal transform part 118, an addition part 119, a first picture memory 120 for motion compensation, an in-loop filter 121 and a selector 125.
The encoder 116 calculates by the subtraction part 107 a prediction error between frames by taking the difference between a video input signal previously transformed to CIF (Common Intermediate Format) of 352 by 288 dots and prediction data stored in the second picture memory 113 for motion compensation. At this time, motion in the range of 15 by 15 pixels is motion-compensated by specifying the prediction data as an arbitrary block of 16 by 16 pixels among 16 by 16 pixels around the block. The motion quantity is specified by a two-dimensional motion vector and transmitted to the decoder along with the video data. The decoding side decodes data of the picture memory for motion compensation in a region displaced from a decoding block by this motion vector as prediction data. For such large motion that no motion compensation is effective, INTRA coding with no prediction is selected by the selectors 123 and 124. The prediction error and the frame pixels are divided into blocks of 8 pixels by 8 lines, and two-dimensional cosine transform is performed on each block in the first orthogonal transform part 108. The pixels of each block are transformed to frequency components by the DCT. The obtained transform coefficients are quantized in the first quantization part 109. By the quantization, the respective transform coefficients are represented from the level 0 of the zero value of the coded coefficient to levels of non-zero values of the coded coefficient which are integers up to a level .+-.127. The quantized data, transmitted to the decoder through a communication part or the like, is inverse-transformed by the second inverse quantization part 110 and the second inverse orthogonal transform part 111 at the same time, thereafter added to the prediction data stored in the second picture memory 113 for motion compensation by the addition part 112, and stored in the second picture memory 113 for motion compensation to be next prediction data. The decoder 122 inverse-transforms the inputted video data through the first inverse quantization part 117 and the first inverse orthogonal transform part 118, thereafter adds the same to the prediction data stored in the first picture memory 120 for motion compensation through the adder 119, and obtains a video output while storing the same as next prediction data in the first picture memory 120 for motion compensation. When an input block is INTRA data, no prediction data is selected by the selector 125 but the input data is directly inverse-transformed, extracted as a video output, and stored in the picture memory for motion compensation.
The above is exemplary predictive coding of a video signal, particularly coding which combines inter-picture coding and intra-picture coding. In INTER coding, mismatch is caused between the contents of frame memories of the coding side and the decoding side when a transmission error is occurred, and hence influence of the error propagates to all subsequent reproduced pictures. Therefore, it is necessary to transmit INTRA-coded video data for refreshing the reproduced pictures.
INTRA coding, which is coding utilizing no correlation between frames, has an enormous coding quantity as compared with INTER coding. When transmitting a frame in which all blocks are INTRA-coded for refresh, therefore, it takes time for transmission and hence a delay time increases. In general, therefore, means of dividing one frame into a plurality of groups of blocks and refreshing a group of blocks every frame by INTRA coding thereby reducing increase of the coding quantity per frame is considered. Japanese Patent Laying-Open No. 5-236464 further solves such a problem that, on the boundary between a refreshed group of blocks and an unrefreshed group of blocks, a mismatch error remains due to such an operation that a refreshed block selects by motion prediction an unrefreshed block in which mismatch is caused as a reference picture. As shown in FIG. 36, this example transmits a group of blocks formed by two rows of blocks or two columns of blocks by INTRA coding and changes the two rows or two columns transmitted by INTRA coding in the row direction or the column direction successively in units of one row or one column. Thus, no motion prediction is performed on a block on the boundary between the groups of blocks while upper and lower or front and rear blocks are refreshed together and hence no block in which mismatch is caused is selected as a reference picture and no mismatch error remains. Further, this example assumes that, as to a subsequent block in two rows or two columns of blocks, no refresh is performed but motion compensation may be inhibited.
In the aforementioned conventional refresh method for picture coding dividing one frame into a plurality of groups of blocks and performing refresh in units of the groups of blocks, it is conceivable to reduce the number blocks of a simultaneously refreshed group of blocks, in order to further reduce increase of the coding quantity per frame. However, even if halving the number of the blocks of the group of blocks simultaneously refreshed by INTRA coding, blocks to be subjected to INTRA coding over two frames or three frames or blocks inhibited from motion compensation over two frames or three frames increase, when such a problem that a mismatch error remains on the boundary between the refreshed group of blocks and the unrefreshed group of blocks due to motion prediction is to be avoided. Therefore, coding efficiency lowers while the generated coding quantity does not much decreases. FIG. 37 is an example in a case of considering those obtained by dividing a group of blocks formed by three columns of blocks further into two left and right groups of blocks as a group of blocks. Referring to FIG. 37, slanted blocks are those to be INTRA-coded. While the number of blocks to be INTRA-coded is 3/4 that in FIG. 36 in the first frame, that in the second frame is at least equivalent, and that in the third frame is 5/4. Such overhead becomes further remarkable when reducing the group of blocks, while the time necessary for refreshing one screen lengthens and hence the group of blocks cannot be much reduced.
In other words, even if reducing increase of the coding quantity per frame by dividing one screen into a plurality of groups of blocks and refreshing a group of blocks per frame by INTRA coding, it has been impossible to remarkably reduce increase of the coding quantity when considering propagation of mismatch resulting from motion prediction. When transmitting a picture having no frame skip for minimizing a delay by a buffer for smoothing dispersion of the coding quantity in a decided transmission band, the information quantity per frame must be rendered as constant as possible. In order to transmit a picture of 30 frames/sec. at a transmission rate of 900 Kbits/sec., for example, it is necessary to approach the generated coding quantity per frame to 30 Kbits to the utmost. In order to reduce the coding quantity increased for transmitting an INTRA-coded group of blocks at this time, coding must be performed while enlarging the quantization step in the quantization part 109 in the encoder 116 shown in FIG. 35. When enlarging the quantization step, coding quantity decreases but a quantization error increases and quantization distortion occurs in the picture. There has been such a problem that, when a group of blocks having a different coding mode makes periodic motion under circumstances where quantization distortion is conspicuous, the group of blocks is recognized as a circulating disturbance line. This disturbance line is particularly remarkably recognized as disturbance in a still picture region.