1. Field of the Invention
This invention generally relates to video decompression and, more particularly, to a system and method that permit a graceful degradation in video decoding, if the decoder has insufficient resources to present the decoded video in real-time.
2. Description of the Related Art
Complexity-scalable image/video decoding techniques are essential to applications where computation power is limited, and/or full resolution quality images are not necessary. Discrete cosine transform (DCT) processes are used in many popular image/video-coding systems, such as JPEG, MPEG-1-2-4, H.263, and H.264 systems. Inverse discrete cosine transform (IDCT) processing is widely recognized as one of the most computation-demanding processes of the image/video decoders. Conventionally, the approaches have been developed to simplify the IDCT process and save computation result in a trade off of added visual artifacts against a loss of resolution.
FIG. 1 is a drawing that illustrates a conventional two-dimensional (2D) 8×8 IDCT process (prior art). The 8×8 DCT coefficients undergo 8 horizontal one-dimensional (1D) IDCT transforms, followed by 8 vertical 1D IDCT transforms, to generate 8×8 image residuals in the space domain. In total, 16 1D IDCT operations are needed. Since the horizontal and vertical transforms are independent of each other, the same result is achieved if the process is begun with the vertical transforms and finished with horizontal transforms. The following discussion, however, will follow the process explicitly depicted in FIG. 1
The key to computation reduction is in the reduction of the number of 1D IDCT operations. The most straightforward way to reduce the number of IDCT computations is to set some of the high frequency DCT coefficients to zero value.
FIG. 2 depicts a few examples of reduced complexity DCT coefficient masks (prior art). The coefficients in the non-shaded area are set to zero value to reduce computation complexity. For example, when the 4×8 mask is applied, the resultant 2D IDCT process requires 4 horizontal and 8 vertical IDCT operations. In total, 12 1D IDCT operations are required. However, as mentioned above, the trade-off associated with a reduction in complexity is the degradation of visual quality. The visual artifacts become very annoying when strong image edges (such as the letter box boundary in movie materials) are present. For example, using a reduced complexity mask to decode a frame with a letter box boundary may result in the appearance of dark stripes in the image. These stripes are artifacts of the letter box boundary.
Many elements of a decoder can be simplified, trading video quality for a reduction in complexity. As described above, elements such as inverse transform or filtering can be skipped in small portions of the image, reducing complexity while giving a small impact on video quality. However, the serial nature of entropy decoding, based on either Variable Length Code (VLC) decoding or arithmetic coding, does not lend itself to this type of graceful degradation since an error decoding one symbol frequently renders the remaining data useless until the next synchronization point. Trying to save complexity, decoding one bit may result in the loss of an entire video frame. For this reason, entropy decoding typically forms a peak processing bottle-neck with non-graceful degradation. As a result, many decoders contain a dedicated entropy decoding unit designed to operate in the worst-case conditions. Other decoders simply generate errors when the entropy decoding is too complex. Complexity is reduced by discarding data with the associated loss of video quality.
The predictions made between frames that is typically described in video decoding is outside of the operation of an entropy decoding unit. An entropy decoder must maintain synchronization, or prediction within a frame, from a first macroblock (MB), to subsequently decoded MB. Each frame begins with a resynchronization point and so synchronization can be reestablished at each frame if needed. Although the synchronization, or prediction, used in the entropy decoder is from a previously decoded MB of the same frame, problems in decoding manifest themselves in the context of a series of frames. These frames have a hard decoding deadline. If one frame is “too slow”, the entire sequence of video frames fails. In contrast, when decoding a single frame like an image via the Internet, a slow decoder means that the user simply waits a little longer for their image.
A worst-case estimate can used to determine the maximum rate of the decoder. However, a conservative worst-case estimate is not representative of normal conditions, and an average complexity is generally more practical consideration in the design of DSP decoder software. A decoder designed around the average complexity, however, has the problem of dealing with excessive spikes in decoding complexity that temporarily exceed the decoding power.
Generally, it is assumed that the complexity of received video is a constant, and designers typically work around a conservative value. In practice however, the video is rarely near this worst-case bound and fluctuates significantly. Unfortunately, when the video complexity rises, more resources are consumed. Therefore, any fluctuations in video complexity translate into a fluctuation in available resources. One technique used for monitoring this phenomena is the speed of decoding, which varies with both the available processing power and the video complexity. A low decoding speed can be inferred by any lateness in decoding pictures. A common phrase to refer to this slow decoding is “loss of real-time decoding”.
The complexity becomes an issue when the set of operations required to decode the video exceeds the resources of the decoder. The sequence of operations required to decode the video can increase, or the available processing power may decrease when other operations are needed, for instance video decoding may be only one of many tasks running on a PC. The range of operations needed to decode a given number of bytes can vary dramatically—one pattern of bits may simply mean copy a block form the last frame, while another pattern may require complex subpixel interpolation, deblocking filtering, prediction weighting, or other steps. If a decoder is designed for worst-case complexity, as is often done for dedicated ASICs, this is not an issue. However, if the video decoder is part of a system, running on a programmable processor for instance, then a worst-case design may be costly to the system as a whole. With this kind of design a large amount of resources are sitting idle, waiting for a worst-case scenario, which rarely if ever happens. In a design less conservative than the complete worst-case, the question arises as to what to do if a portion of video requires more resources to decode than are available.
It would be advantageous if video entropy decoding could be gracefully degraded without losing synchronization between MBs in a frame.