1. Field of the Invention
This invention relates to a video coding apparatus wherein a video signal is compression-coded at high efficiency and a video decoding apparatus for decoding the compression-coded signal to reconstruct the original video signal, and more particularly to a video coding/decoding apparatus that is immune to errors in the transmission channel/storage medium and assures a high-quality transmission channel/storage.
2. Description of the Related Art
In a system for transmitting and storing a video signal, such as a videophone, a teleconference system, a personal digital assistant, a digital video disk system, or a digital TV broadcasting system, a video signal is compression-coded into code strings with a small amount of information, which are transmitted to a transmission channel and stored in a storage medium. The transmitted and stored code strings are decoded to reconstruct the original video signal.
For video-signal compression-coding techniques applied to such a system, various methods have been developed, including motion compensation, discrete cosine transform (DCT), and sub-band coding, pyramid coding techniques, and a combination of these techniques. Furthermore, ISO, MPEG1, MPEG2, ITU-T H. 261, and ITU-T H.262 have been determined as international standard systems for compression-coding a video signal. Any of these coding techniques uses motion compensation adaptive predictive cosine transform coding, which has been described in detail in, for example, reference 1: Hiroshi Yasuda, "International Standard for Multimedia Coding," Maruzen, June. 1991.
When the code strings obtained by coding a video signal as described above are transmitted and stored via a radio transmission channel that is prone to errors, the picture signal reconstructed on the decoding side may be degraded due to errors in transmission and storage. One known measure to deal with such errors is the multi-layered coding system which, under the conditions where code strings can be transmitted via a plurality of transmission channels each having a different error probability, divides the code strings into several layers and transmits the upper-layer code strings via transmission channels with lower error probabilities to reduce the degradation of picture quality due to errors. One proposed layer division method is such that the mode information, the motion compensation information, the low-frequency components of the prediction error signal are allocated to the upper layers and the high-frequency components of the prediction error signal are allocated to the lower layer.
In a conventional layered video coding apparatus, a prediction circuit detects a motion vector between the input video signal and the reference video signal obtained by coding and local decoding and stored in the frame memory, performs the motion compensation prediction of a specific unit region (referred to as a prediction region) on the basis of the motion vector, and produces a motion compensation prediction signal. By subtracting the prediction signal from the input video signal, a prediction error signal is produced. The prediction error signal undergoes discrete cosine transform in blocks of a specific size at a DCT circuit and is converted into DCT coefficient information. The DCT coefficient information is quantized at a quantizer. The quantized DCT coefficient information is branched into two pieces of information; one piece of DCT coefficient information undergoes variable-length coding at a first variable-length coding circuit and the other piece of DCT coefficient information undergoes inverse quantization. The inverse quantized information is subjected to inverse discrete cosine transform. The inverse DCT information is added to the prediction signal to produce a local decoded signal. The local decoded signal is stored in the frame memory as a reference video signal.
The prediction mode and motion vector information related to prediction outputted from the prediction circuit is subjected to variable-length coding at a second variable length coding circuit. The code strings outputted from each of the first and second variable-length coding circuits are multiplexed at a multiplexer, divided into upper-layer code strings and lower-layer code strings, and outputted to the transmission channels. Specifically, the upper-layer code strings are outputted to transmission channels having a relatively low probability that transmission errors will take place, and the lower-layer code strings are outputted to transmission channels having a relatively high probability that transmission errors will occur.
The multiplexer divides the code strings into the upper-layer code strings and the lower-layer code strings in a manner that allocates the mode information representing the prediction mode at the prediction circuit, the motion vector information (MV), and the low-frequency-band DCT coefficient information in the variable-length-coded DCT coefficient information to the upper-layer code strings and the remaining high-frequency-band DCT coefficient information in the variable-length-coded DCT coefficient information to the lower-layer code strings.
Such a conventional multi-layered video coding apparatus has the following problems. A first problem is that since each prediction region contains only one piece of motion vector information whose error would cause the picture quality to deteriorate seriously, if an error occurs in the motion vector information, the motion information cannot be decoded for the prediction region at all, leading to a serious picture-quality deterioration. To reduce such a picture-quality deterioration, all of the motion vector information (MV) should be allocated to the upper-layer code strings. In general, however, there is a limit to the ratio of the code amount of code strings in each layer to the total code amount of code strings in all of the layers. If all of the motion vector information is allocated to the upper-layer code strings, the limit may be exceeded. To avoid this, if the motion vector information is allocated to the lower-layer code strings, this causes the problem that error resilience decreases seriously.
Furthermore, since the individual code words of the two transmitted code strings are made up of the variable-length codes created at the first and second variable-length coding circuits, the variable-length codes may be out of synchronization due to errors in decoding. With the conventional video coding apparatus, however, multiplexing is effected in such a manner that important information related to prediction including the mode information and motion vector information, whose errors would lead to a serious deterioration of the decoded picture, is mingled with DCT coefficient information including the prediction error signal, whose errors would not cause a serious deterioration. Thus, when synchronization failure has occurred during the decoding of the code words containing the unimportant information, this may introduce errors into the code words containing the important information, causing a serious deterioration of the reconstructed picture. Should this happen, synchronization cannot be recovered until a synchronization code appears. Consequently, all of the pieces of information on the decoded pictures obtained until then have become erroneous, raising the problem that a serious deterioration develops in a wide range of the picture.
Furthermore, many conventional video coding systems use the technique for calculating the difference between adjacent motion vectors and subjecting the difference to variable-length coding in order to increase the coding efficiency. Since variable length coding is used, even an error in part of a code string will cause synchronization failure in variable length coding, which will permit the error to have an adverse effect on all of the subsequent code strings, bringing about a serious deterioration of quality of reconstructed video signal. Since the difference between adjacent motion vectors is coded, if an error occurs in one motion vector, the error will affect all of the pieces of the motion vector information obtained by computing the difference between the erroneous motion vector and each of the other motion vectors and coding the difference, with the result that the quality of reconstructed video signal will degrade considerably.
Furthermore, when there is a limit to the amount of codes that can be transferred over a transmission channel with a low error rate, part of the motion vector information must be coded in a lower layer with a high error rate, bringing about a substantial deterioration of picture quality. When a picture to be coded makes a great motion, the amount of codes in the motion vector information is very large. When the coding rate is relatively low, only the motion vector information may account for more than half of the total amount of codes. This makes greater the rate of the motion vector information to be coded in a lower layer, so that the possibility that an error will get mixed in the motion vector information becomes stronger, making a serious deterioration of picture quality more liable to develop.
On the other hand, many conventional video coding systems including the international standard systems use block matching motion compensation that divides the input motion picture into square blocks (referred to as motion compensation blocks) and performing motion compensation by representing the motion of each of these blocks by a motion vector. With the block matching motion compensation, when a motion compensation block contains regions with different motions, the vector to be obtained is the average of the motions in the respective regions, so that each region cannot be predicted with high accuracy, causing the problem that the quality of reconstructed video signal may deteriorate at the boundaries or the edges of the regions. When the coding rate is low, a motion compensation block must be made larger than the size of the picture, making degradation of picture quality from block matching more serious.
To overcome the problem with the block matching motion compensation, a segmentation based motion compensation scheme has been studied which divides the motion compensation blocks along the boundary of the object and performs motion compensation using a different motion vector for each region. The segmentation based motion compensation scheme requires an additional piece of information (region shape information) to indicate how the regions have been divided. Although the motion compensation efficiency is improved more as the region shape is represented more accurately, the volume of the region shape information increases accordingly. Therefore, the point of improvements in the coding efficiency is how efficiently the region shape is represented. When the coding rate is low, the ratio of the side information including the motion vector information and the region shape information gets larger, making the problem more significant.
The scheme for coding the region shape information include a method of chain-coding the region shape information, a method of approximating the region shape using several division patterns, and a method of approximating the region shape through interpolation by expressing the shape in approximate blocks. With any method, however, it is difficult to represent the shape of a region with a high accuracy using a small amount of codes, so that segmentation based compensation coding does not necessarily improve the coding efficiency remarkably. Furthermore, a method has been studied which estimates the region shape information from the decoded picture of an already coded frame at both of the coding unit and the decoding unit and consequently requires no independent region shape information. With this method, however, the amount of processing at the decoding unit increases significantly, and the decoded reconstructed picture contains coding distortion, so that it is difficult to effect region division with a high accuracy and better results are not necessarily obtained.
As described above, with the conventional video coding apparatus, since only one piece of information related to prediction, such as motion vector information whose error would degrade the quality of the decoded picture seriously, is coded for each prediction region, resistance to error is low.
To increase resistance to error, pieces of information on all of the predictions must be transferred via transmission channels having low error probabilities. Since there is a limit to the ratio of the code amount of code strings in each layer to the total code amount of code strings in all of the layers, the code strings must be transferred over transmission channels having different error probabilities, thus impairing the feature of multi-layered coding to alleviate the deterioration of picture quality due to errors.
Furthermore, with the conventional video coding apparatus, since the relatively important information including information related to prediction and the relatively unimportant information are mingled in code strings, an error occurred in the unimportant information affects the important information, resulting in a serious deterioration of picture quality.
As described above, with the conventional video coding/decoding apparatus using variable length coding to code the motion vector information, even if a measure to cope with errors, such as multi-layered coding, is taken, only an error in part of the code words in the motion vector information is permitted to spread over the remaining code words behind, so that the error has an adverse effect on the entire screen. Since all of the pieces of the motion vector information cannot be coded in the upper layers, many errors occur in pieces of the motion vector information, making a significant deterioration of picture quality liable to develop in the decoded picture.
Additionally, with the conventional video coding/decoding apparatus using block matching motion compensation, when a motion compensation block contains regions with different motions, the motion compensation efficiency decreases, causing the quality of reconstructed video signal to deteriorate. In addition, the amount of codes in the region shape information is large, making the coding efficiency lower.
Furthermore, with the conventional video coding/decoding apparatus using segmentation based compensation, the amount of codes in the region shape information is large, thus decreasing the coding efficiency.