1. Field of the Invention
The present invention relates to a method for computational graceful degradation in an audiovisual compression system. This invention is useful in a multimedia encoding and decoding environment where the computational demands for decoding a bitstream is not well defined. It is also useful in cases where channel capacity is limited and some form of quality of service guarantee is required. It is also useful for inter working between two video services of different resolutions.
2. Description of the Related Art
It is common in the case of software decoding to employ some form of graceful degradation when the system resources is not sufficient to fully decode all of the video bitstream. These degradation ranges from partial decoding of the picture elements to dropping of complete pictures. This is easy to implement in the case of a single video stream.
In the proposed new ISO/IEC SC29/WG11 standard of MPEG-4, it is possible to send multiple Audiovisual, AV, objects. Therefore, the total complexity requirements no longer depend on one single stream but on multiple streams.
In compression systems such as MPEG-1, MPEG-2 and MPEG-4, a high degree of temporal redundancy is removed by employing motion compensation. It is intuitive to see that successive pictures in a video sequence will contain very similar information. Only regions of the picture that are moving will change from picture to picture. Furthermore, these regions usually move as a unit with uniform motion. Motion compensation is a technique where the encoder and the decoder keep the reconstructed picture as a reference for the prediction of the current picture being encoded or decoded. The encoder mimics the decoder by implementing a local decoder loop. Thus, keeping the reconstructed picture synchronized between the encoder and decoder.
The encoder performs a search for a block in the reconstructed picture that gives the closest match to the current block that is being encoded. It then computes the prediction difference between the motion compensated block and the current block being encoded. Since the motion compensated block is available in the encoder and the decoder, the encoder only needs to send the location of this block and the prediction difference to the decoder. The location of the block is commonly referred to as the motion vector. The prediction difference is commonly referred to as the motion compensated prediction error. These information requires less bits to send that the current block itself.
In intra-picture coding, spatial redundancy may be removed in a similar way. The transform coefficients of the block can be predicted from the transform prediction of its neighboring blocks that have already being decoded.
There are two major problems to be solved in this invention. The first is how to indicate the decoding complexity requirements of the current AV object. In the case where there are multiple AV objects, the systems decoder must decide how much resource should be given to a particular object and which object should have priority over another. In other words, how to model the complexity requirements of the system. A point to be noted here is that the complexity requirements of the decoder is dependent on the implementation of the decoder. An operation that is complex for one implementation may be simple for another implementation. Therefore, some form of implementation independent complexity measure is required.
The second problem is how to reduce complexity requirements in the decoder. This deals with the method of reducing the complexity requirements of the decoding process while retaining as much of the information as possible. One biggest problem in graceful degradation is the problem of drift caused by errors in the motion compensation. When graceful degradation is employed the reconstructed picture is incomplete or noisy. These errors are propagated from picture to picture resulting in larger and larger errors. This noise propagation is referred to as drift.
In order to solve the problems the following steps are taken in the present invention.
The AV object encoder encodes the AV object in a manner that would allow different amounts of graceful degradation to be employed in the AV object decoder. Parameters relating to the computational complexity requirements of the AV objects are transmitted in the systems encoder. Implementation independent complexity measure is achieved by sending parameters that gives an indication of the operations that are required.
At the systems decoder, estimates of the complexity required are made based on these parameters as well as the implementation methods being employed. The resource scheduler then allocates the appropriate amount of resources to the decoding of the different AV objects. In the AV object decoder, computational graceful degradation is employed when the resources are not sufficient to decode the AV object completely.
In accordance with a first aspect of the present invention, a method of encoding a plurality of audiovisual objects into a compressed coded representation suitable for computational graceful degradation at the decoder comprises:
encoding said audiovisual objects, incorporation methods allowing computational graceful degradation to be employed in the decoder, into their coded representations;
estimating the implementation independent computational complexity measures in terms of a plurality of block decoding parameters;
partitioning said coded representations of the audiovisual objects into a plurality of access units and adding header information to form packets;
inserting a descriptor containing said block decoding parameters into the header of the packet; and
multiplexing these packets to form a single multiplexed bitstream.
In accordance with a second aspect of the present invention, a method of decoding a multiplexed bitstream, with computational graceful degradation, to obtained a plurality of audiovisual objects, comprises:
de-multiplexing the single multiplexed bitstream into a plurality of packets comprising of packet headers and access units;
extracting the descriptor containing a plurality of block decoding parameters from the packet headers;
reassembling the access units into their original coded representations of the audiovisual objects;
estimating the decoder specific computational complexity measures based on said block decoding parameters and the current decoder implementation; and
decoding said coded representations of the audiovisual objects, using computational graceful degradation, where necessary, to satisfy the estimated decoder specific computational complexity requirements.
Preferably, the incorporation methods allowing computational graceful degradation to be employed in the decoder, comprise:
partitioning the input pictures to be encoded into a plurality of sub-regions numbered in increasing order, beginning with the full picture as the first sub-region, where each sub-region comprising only of a subset of the pixels within the sub-region preceding it;
entropy coding the position and dimension of the sub-regions into a compressed coded representation within the bitstream;
further partitioning the sub-regions into a plurality of blocks for encoding into a compressed coded representation within the bitstream;
performing motion estimation and motion compensation for said blocks using only the pixels from the reconstructed picture that belong to sub-regions having the same or higher numeric order as said blocks;
entropy coding the motion vectors into a compressed coded representation within the bitstream;
transforming the motion compensated prediction difference into an orthogonal domain;
quantizing the transformed coefficients using a quantization method; and,
entropy coding the quantized transformed coefficients into a compressed coded representation within the bitstream.
Preferably, the method for decoding the coded representations of the audiovisual objects in accordance with the second aspect, using computational graceful degradation where necessary to satisfy the estimated decoder specific computational complexity requirements, further comprises:
entropy decoding the position and dimension of the sub-regions from the compressed coded representation within the bitstream;
selecting only the blocks that are within the sub-region of interest for decoding;
entropy decoding the compressed coded representation to give quantized transformed coefficients;
inverse quantizing said quantized transformed coefficients to give the transformed coefficients;
inverse transforming said transform coefficients to give the spatial domain motion compensated prediction difference;
entropy decoding the motion vectors from the compressed coded representation within the bitstream;
performing motion compensation for said blocks using only the pixels from the reconstructed picture that belong to sub-regions having the same or higher numeric order as said blocks; and,
reconstructing the picture and storing said picture in the frame memory for prediction of the next picture.
Preferably, the method in accordance with the first aspect of the invention, whereby incorporation methods allowing computational graceful degradation to be employed in the decoder, further comprises:
partitioning the input pictures to be encoded into a plurality of sub-regions numbered in increasing order, beginning with the full picture as the first sub-region, where each sub-region comprising only of a subset of the pixels within the sub-region preceding it;
entropy coding the position and dimension of the sub-regions into a compressed coded representation within the bitstream;
further partitioning the sub-regions into a plurality of blocks for encoding into a compressed coded representation within the bitstream;
transforming said blocks into an orthogonal domain;
quantizing the transformed coefficients using a quantization method;
performing quantized transform coefficient prediction for said blocks using only the corresponding quantized transform coefficients from the blocks above and to the left that belong to sub-regions having the same or higher numeric order as said blocks; and,
entropy coding the predicted difference of the quantized transformed coefficients into a compressed coded representation within the bitstream.
Preferably, the method in accordance with the first aspect of the invention, comprises:
entropy decoding the position and dimension of the sub-regions from the compressed coded representation within the bitstream;
selecting only the blocks that are within the sub-region of interest for decoding;
entropy decoding the compressed coded representation to give quantized transformed coefficients;
performing quantized transform coefficient prediction for said blocks using only the corresponding quantized transform coefficients from the blocks above and to the left that belong to sub-regions having the same or higher numeric order as said blocks;
inverse quantizing said quantized transformed coefficients to give the transformed coefficients;
inverse transforming said transform coefficients to give the spatial domain pixel values; and,
reconstructing the picture and storing said picture in the frame memory for prediction of the next picture.
Typically, the plurality of block decoding parameters comprises numeric numbers indicating the number of:
block entropy decoding operations;
block motion compensation operation;
block inverse quantization operations;
block transform operations;
block addition operations; and,
block memory access operations.
Preferably, the descriptor comprises:
a descriptor identification number signaling the descriptor type;
a descriptor length field to indicate the size of the descriptor; and,
a plurality of block decoding parameters.
Typically, in the method of partitioning the input pictures to be encoded into a plurality of sub-regions, the sub-regions are rectangular.
Preferably, in the method of performing motion estimation and motion compensation for said blocks, using only the pixels from the reconstructed picture that belong to sub-regions having the same or higher numeric order as said blocks, implies that only prediction blocks that lie completely within said sub-regions are selected.
Typically, when only the pixels from the reconstructed picture that belong to sub-regions having the same-or higher numeric order as said blocks are used, prediction blocks may lie partially outside said sub-regions but with the additional condition that the pixels lying outside said sub-region are replaced by the nearest pixels from within the sub-regions.
Preferably, in the method of partitioning the pictures into a plurality of sub-regions, the position and dimension of each of said sub-regions may vary from picture to picture and said position and said dimension are coded by means of a pan scan vector, giving the horizontal and vertical displacement, a width and a height.
Typically, in the method of partitioning the pictures into a plurality of sub-regions, the position and dimension of the sub regions are the same from picture to picture and said position and said dimension are coded once at the beginning of the sequence by means of a horizontal and vertical displacement, a width and a height.
Preferably, in the method of encoding and decoding, the transform is the Discrete Cosine Transform.
Typically, in the method of encoding and decoding, the number of sub-regions is two.
Preferably, in the method where there is a plurality of sub-region numbered in increasing order and the motion vector can point into a sub-region of lower order but not out of a lower order to a higher ordered number.