1. Field of the Invention
The present invention relates to the compression encoding and decoding of data, more particularly relates to a device for generating hierarchically encoded data and decoding the hierarchically encoded data in streaming distribution of data in accordance with an ever-changing bandwidth as in an IP (Internet protocol) network.
2. Description of the Related Art
FIGS. 1 and 2 show the respective configurations of a hierarchically encoding device conventionally used in the streaming distribution of moving pictures and its decoding device, respectively. The encoding and decoding devices are installed in the distribution device of moving pictures and its receiving device.
Firstly, on the encoder side, a base layer encoding unit 102 encodes an original picture 111, and generates the bit stream 121 of a base layer. In this case, the encoded data is simultaneously decoded and the decoded picture 113 is outputted. An enhancement layer encoding unit 101 encodes a difference between the decoded picture 113 and original picture 112, and generates the bit stream 122 of an enhancement layer. Although the original pictures 111 and 112 represent the substantially same contents, sometimes they are the same and sometimes they are different.
Then, on the decoder side, a base layer decoding unit 202 decodes the bit stream 121 of the base layer, and generates a decoded picture 212. An enhancement layer decoding unit 201 decodes the bit stream 122 of the enhancement layer. The final decoded picture 212 of the enhancement layer is obtained by adding the decoding result of the base layer to the decoded picture obtained only from the enhancement layer.
As its typical example, there is an ISO/IEC (International Organization for Standardization/Internal Electro-technical Commission) 14496-2: 1999/FDAM 4 (Final Draft Amendment 4). This is an encoding/decoding method specified in the standard specification document of MPEG-4 (Moving Picture Experts Group phase 4) Visual Streaming Profile (accurately, the difference draft of the specification document of MPEG-4 Visual).
According to this specification, discrete cosine transform (DCT) is applied to the difference between the decoded picture 113 generated from the encoded picture of the base layer and the original picture 112, and a plurality of bit planes ranging from an MSG (most significant bit) to a LSB (least significant bit) are generated using each coefficient of the transform result. Then, each bit plane is used as each hierarchical-level data in the enhancement layer. At the time of network transmission, the amount of data can be adjusted by sequentially transmitting data from the MSG side in accordance with its bandwidth.
FIG. 3 is a flowchart showing the encoding process of such an enhancement layer. The encoding device firstly calculates in advance a difference value U for each pixel, between the original picture 112 and decoded picture 113 from the base layer encoding unit 102, and stores it (301). Then, the encoding device divides U into blocks to obtain hierarchical-level data, and prepares for hierarchizing each coefficient as a bit plane, by applying a DCT to each block (302).
Then, both a frame synchronous bit and a bit k indicating the possibility of a bit shift in the hierarchical level are transmitted as the header information of a frame (303). If k is 1, the bit shift is possible, and if k is 0, the bit shift is impossible. Then, both a hierarchical-level synchronous bit and hierarchical level identifier (hierarchical level number) are transmitted, and the bit plane of the hierarchical level is shifted (304).
Then, a run-length code is detected from the block using the shifted bit plane, and the validity/invalidity of the block is determined. A run-length code is represented by the combination of a zero run-length RUN and an EOP. (end of plane) indicating whether the corresponding bit is the last asserted bit of the block. If it corresponds to the last non-zero bit of the block, its EOP becomes 1.
Then, both a bit indicating the validity/invalidity of a macro-block and the validity/invalidity of a block are transmitted (306), based on the determination result in 305. If each of these bits is 1, it is indicated that it is valid, and if it is 0, it is indicated that it is invalid. If the bit indicating the block validity/invalidity and k both are 1, a bit z indicating the amount of shift of the bit plane is transmitted.
Then, if all processed hierarchical levels corresponding to the same position as a block to be processed are invalid, a bit (flag) indicating invalidity is transmitted (307). Then, variable-length encoding is applied to the run length code detected in 305, and its code bit is transmitted (308). In this case, a sign bit indicating the plus/minus sign is also encoded together.
The processes in 307 and 308 are repeated for each block, and the processes in 306 through 308 are repeated for each macro-block. The processes in 304 through 308 are repeated for each hierarchical level from the MSB side, and the processes in 301 through 308 are repeated for each frame.
If in 306, the block is invalid, the processes in 307 and 308 are skipped. If in 307, all processed hierarchical levels are invalid, and the block to be processed is also invalid, the process in 308 is skipped. Thus, a bit stream as shown in FIG. 4 is generated and is transmitted to the receiving device.
FIG. 5 shows an example of DCT coefficient bit-plane generation, including an extract from the following reference.
“Overview of Fine Granularity Scalability in MPEG-4 Video Standard” IEEE Trans. On Circuits and Systems for Video Technology, Vl.11 No. 3, March 2001, pp.301-317.
The encoding device firstly one-dimensionally expands a coefficient by zigzag scanning the coefficient. In this case, the absolute value 501 of each coefficient is binarized, and a sign bit 502 is added to the binarized absolute value. In this case, since the absolute value can be expressed by four bits, it is expanded into a four-layered bit plane 503, and a variable length code (VLC) 504 is generated for each bit plane.
In this example, a two-dimensional VLC obtained by encoding a symbol (RUN, EOP) is used. When calculating a VLC, a sign bit is added to the MSB of the coefficient as one bit code. The decoding device performs the operation of the reversal of encoding, and obtains a DCT coefficient by sequentially adding up the bit-plane bit of each hierarchical level.
FIGS. 6 through 8 show examples of the encoding of a specific block. If in 302 of FIG. 3, block coefficients as shown in FIG. 6 are obtained, they are expanded to the four-layered bit planes shown in FIG. 7. In this case, if as shown in FIG. 8, the bit plane of the MSB is zigzag scanned, three of (0, 0), (54, 0) and (5, 1) are detected as (RUN, EOP). In this case, the sign bit is also simultaneously detected. The same process is applied to the bit planes of other hierarchical levels, and a code corresponding to each (RUN, EOP) is transmitted.
However, on the decoder side, (RUN, EOP) is obtained by decoding the received code, and bit planes as shown in FIG. 7 are generated from the obtained data and by the zigzag scan. In this example, since the bit plane of MSB corresponds to the fourth bit from LSB, a value obtained by shifting the bit plane leftward by four bits is stored as block data. Similarly, the bit planes of MSB-1 and MSB-2 are obtained by shifting them leftward by three bits and two bits, respectively, and are stored. As a result of such a bit shift, the four-layered bit planes shown in FIG. 9 can be generated, and adding up these bit planes, the coefficients shown in FIG. 6 are restored. Although in this example, the process for one block was described, in reality, the same process is applied to all blocks in one frame.
In such a hierarchical encoding, a bit plane on the LSB side is more detailed information in an enhancement layer, and its priority is low, compared with a bit plane on the MSB side. Therefore, if a bit rate must be reduced at the time of bit stream transmission, the transmission of a bit plane closest to the LSB side can be sequentially cancelled in the first place.
Although this method is contrived so that a code in MSB and ones in hierarchical levels other than MSB may be different, this is an encoding method using no correlation between hierarchical levels in the calculation process. Nevertheless, if information about the MSB side lacks, a sign bit cannot be detected, which is a problem.
Since in the conventional encoding method, a bit is generated due to variable-length encoding depending on both the number of bit planes and that of coefficients (including run length), encoding efficiency greatly degrades, compared with that of base-layer encoding, which is a problem. Since correlation between hierarchical levels is not utilized, encoding efficiency far more degrades.