1. Field of the Invention
The present invention relates method and model for regulating the computational and memory requirements of a compressed bitstream in a video decoder. This invention is useful in the field of multimedia audio-visual coding and compression techniques where the encoder needs to regulate the complexity requirements of the bitstreams it generates. This ensures that decoders conforming to the complexity specification of the standard can successfully decode these bitstreams without running short of resources.
2. Description of the Related Art
In the past, implementers of video decoders that conform to a certain performance capability of a standard are required to ensure that the decoders have sufficient resources to support the worse case scenario that is possible according to the specification. This is not a good engineering practice as usually the worse case scenario represents a case that almost impossible under normal operating conditions. This leads to over engineering and a waste of resources.
Currently within the MPEG-4 standardization, there is an effort to specify complexity bounds for the decoder based on the complexity of the bitstream rather than the worse case scenario. This is a statistical method based on a common unit of complexity measure. The standard will specify a fixed value for the maximum complexity allowed in a compliant bitstream. The decoder is required to provide resources sufficient to decode all compliant bitstreams. The encoder is required to ensure that all bitstreams that are generated do not exceed the maximum complexity bounds and are therefore complaint.
FIG. 1 shows graphically the above concept. In FIG. 1, valid bitstreams are distributed to the left of the fixed value of the standard and all compliant encoder are distributed to the right of the fixed value of the standard. The complexity bound is indicated by the straight line in the graph. The abscissa is given in complexity units. On the left of the line are all the conformant bitstreams. The typical distribution of the bitstream is depicted here. The majority of the bitstreams would have a complexity that is much lower than the complexity bound. A few bitstreams will approach this bound. When a bitstream exceeds this bound it is no longer a conformant bitstream and therefore not shown. On the right side of the complexity bound is the distribution of decoders. Most decoders would be designed as closed to the complexity bound as possible in order to save cost. A few decoders may have more resources that are required and these lie further to the right of the graph. A decoder that does not have enough resources to satisfy the complexity requirements of a compliant bitstream will lie to the left of the complexity bound and will be considered non-compliant.
FIG. 2 show a simple complexity measure method where the encoder counts the number of each macroblock type selected and evaluates the complexity of the bitstream based on some predefined cost function given to each of the macroblock type. FIG. 2 shows a simple method of counting the cost function of the bitstream being generated by calculating the equivalent I-MB units. However, this has the problem of not being able to give the instantaneous complexity measure and does not consider other resource such as memory. Information about the current macroblock is passed to the Macroblock Type Decision, module 201, where the decision to encode the macroblock in a particular method is made. This decision is then counted by the Cost Function Generator, module 202, which converts this information into a complexity cost function. The complexity cost function is then fed back to the Macroblock Type Decision module for the future decision.
Modules, 203 to 210 are the typical modules required for a hybrid transform coder. The input picture is partitioned into blocks that are processed by the motion estimation and compensation modules 210 and 209, respectively. Note that this step is skipped if there is no motion prediction. The motion compensated difference signal is then processed by the DCT transform module 203. The transform coefficients are then Quantized in the Quantization module 204, The quantized coefficients are then entropy coded together with the overhead information of the macroblock type and motion vectors in the Variable Length Coding module 205. The local decoder comprising of modules 206 to 209 reconstructs the coded picture for use in prediction of future pictures. The Inverse Quantization, module 206, inverse quantizes the coefficients before it is fed into the Inverse DCT, module 207, where the difference signal is recovered. The difference signal is then added with the motion prediction to form the reconstructed block. These blocks are then stored in the Frame Memory, module 208, for future use.
Also, in video coding it is inherent that the compression process results in a variable bitrate bitstream. This bitstream is commonly sent over a constant bitrate channel. In order to absorb the instantaneous variation in the bitrate it is common to introduce buffers at the output of the encoder and at the input of the decoder. These buffers serve as reservoir for bits and allow a constant bitrate channel to be connected to an encoder that generates variable bitrate bitstreams as well as to a decoder that consumes bitstreams at a variable bitrate.
The buffer occupancy changes in time, because the rate at which the buffer is being filled and the rate at which it is being emptied are different. However, over a long period of time, the average rate for filling the buffer and the average rate of emptying the buffer can be defined to be the same. Therefore, if we allow a large enough buffer the steady state operation can be achieved. To work correctly the buffer must not become empty (underflow) or be totally filled up (overflow). In order to ensure this constraint, models of the buffer have been presented in the literature such as MPEG-1 and MPEG-2 where the video buffer model allow the behaviour of the variable bitrate decoder connected to a constant bitrate channel. The remainder of the decoder does not need to be model because the video decoding method has been defined at a constant frame rate and each frame having a constant size. Therefore, the constant rate of decoding and the consumption of buffer are well defined in time and the video buffering verifier (VBV) is used to verify whether the buffer memory required in a decoder is less than the defined buffer size by checking the bitstream with its delivery rate function, R(t).
Defining the complexity measure is not sufficient to ensure that the decoder can be designed in a unambiguous way. There are two reasons for this.
The first reason is that the complexity is measured in time. Since the time is sufficiently large, it can accommodate several frames of pictures. The complexity distribution may be such that the resources of the decoder may be exhausted in the instantaneous time while the average complexity is below the limit set. Restricting the window to a shorter time would then restrict the variability of the complexity of the pictures. This means that all pictures must have the constant complexity. This is not good since by the nature of the coding modes different picture types should have different complexities.
The second reason is that the complexity is not just related to the computation time. A second element, which is not captured in the complexity measure, is the memory requirements.
The problem to be solved is therefore to invent a method for regulating the complexity requirements of the bitstream in terms of computational and memory requirements.
Also, in recent developments in the video compression process, a more flexible encoding method that is object oriented has been defined by MPEG. This flexible encoding scheme supports variable number of macroblocks within a video picture and different picture rate such that the rate of decoding and the rate of consumption of the memory are no longer constant. It becomes necessary to measure these rates over time to ensure them not violate the maximum capability of the decoder.
Also, the problem to be solve is how to define new verifiers and algorithms to measure the parameters of a compressed video bitstream to ensure the generated bitstream can be decoded with defined capability and resources.
In order to solve the above problem, a joint computational and memory requirement model is designed. By considering the computational as well as the memory requirements of the bitstreams we can accurately constraint the resource requirements in the decoder.
The memory requirements are well defined by the amount of memory available. The usage and release of the memory is also well defined in time by the decoding and presentation time stamps of the video sequence. This time stamps are embedded in the bitstreams.
By linking the computation complexity units to the memory usage, it is therefore possible to solve the first problem where the window for defining the complexity bound is ambiguous. By linking these requirements, the computational and memory requirements can be bounded based on the decoding and presentation time stamps. There is no longer the need for defining a sliding window for measurement of complexity. At the same time the pictures are not constrained to have constant complexity.
Furthermore, the VCV model 130 provides the computational requirements to determine the start and end time of the decoding of the macroblocks. The VMV 140 model describes the behavior of the reference memory and the occupancy of the reference memory. The VPV 105 defines an algorithm to check the bitstream and verify the amount of presentation buffer.
This invention links the models in terms of the memory consumption, which allows the bitstreams to be constrained by a physical limitation of the decoder. The apparatus to implement the verification is also provided in this invention.
Furthermore, a complete new set of verifier models is developed: Video Complexity Verifier (VCV), Video memory Verifier (VMV) and Video Presentation Verifier (VPV). The models specify the behavior of a decoder for variable VOP size and rate and define new parameters and bounds to measure and verify the computational and memory resources that the bitstream demands, see.
The operation of the invention is as follows. The encoder monitors the complexity of the bitstream being generated by counting the macroblock type. Each macroblock type is assigned a predefined cost in some complexity units. Each macroblock decoded also consumes a predefined amount of memory space. Depending on the type of the VOP the memory is occupied for different duration. Memory is released when the macroblock in the VOP is no longer required for display or prediction.
The virtual decoder is assigned the maximum bound of complexity units and memory. The virtual decoder is allowed to decode the bitstream as fast as is possible subject to the limit of the complexity bound. However, in doing so the decoder would have to have extra memory to keep the decoded VOPs until it is time for them to be displayed or until it is no longer needed for prediction. So it is clear that the virtual decoder is bounded both by the amount of processing capability and memory available.
Therefore by monitoring the complexity units requirements of the bitstream and adjusting the decoding time stamp of the VOP the virtual decoder is able to prevent the memory usage from exceeding its bound. Thus the virtual decoder is allowed to use less time on a simple VOP and more time on a complex VOP. The virtual decoder is defined by the following rules:
a) The amount of memory required for decoding the current VOP is defined by the number of macroblocks in the VOP and is consumed at a constant rate between the decoding time stamp of the current VOP and the next VOP.
b) At the presentation time of an I- or P-VOP the total memory allocated to the previous I- or P-VOP in decoding order is released instantaneously.
c) At the presentation time of a B-VOP the total memory allocated to that B-VOP is released instantaneously.
d) At any time, the decoding time stamp of the (n+1)th VOP in decoding order, DTSn+1, must be less than or equal to the presentation time stamp of the nth VOP in decoding order, PTSn.
DTSn+1xe2x89xa6PTSnxe2x80x83xe2x80x83(1)
where n is in decoding order.
e) At any time, the sum of the memory consumed must not exceed the maximum memory resources available, MMax. Otherwise, the virtual decoder is said to have memory overflow.
f) At any time, the ratio of the decoding complexity of the current VOP, Cn, to the decoding time available, DTSn+1xe2x88x92DTSn, must be less than the maximum complexity resources available per second, Cxe2x80x2Max. Otherwise, the virtual decoder is said to have complexity overflow.
Cn/(DTSn+1xe2x88x92DTSn) less than Cxe2x80x2Maxxe2x80x83xe2x80x83(2)
where n is in decoding order.
A valid bitsteam is thus one where the values in the bitstream satisfy the conditions in d), e) and f) and does not cause the vertual decoder to overflow in memory or complexity resources.