Given the growing pervasiveness of multimedia in recent years, one important application that merits improvement is next generation video coding and decoding. Next generation video coding and decoding has to address and support higher resolution and higher frame rates, which require high processing performance.
As is well known, video coding is required to overcome the limitations and costs of transmission bandwidth and data storage. Video codecs, which are devices or software that enable video compression and/or decompression for digital video, can be loosely classified into two categories: low power; and high performance. Both categories of video codecs require improvement. As an example, as video requirements of multimedia devices continue to become more demanding, such demands, especially with video requirements, require such multimedia devices to provide high video performance, however, increases in video performance are very demanding on processors and application specific integrated circuits (ASICs), resulting in high power consumption. For mobile devices, as well as other devices, low power consumption is a key consideration. Specific to mobile devices, low power consumption translates to reduced size, decreased weight, and lower cost. In addition, for non-mobile devices, it is still desirable to decrease power consumption to decrease costs associated with power consumption.
Real-time low-latency video playback is required for popular applications such as, but not limited to, video conferencing. For real-time video playback, a coded video picture should be decoded within an inter-frame time interval (e.g., 33.3 ms for 30 fps).
Low power video playback is an important requirement for battery-operated mobile devices, such as, but not limited to, cellular telephones. An effective method of power reduction is to trade-off performance (speed) for power via voltage scaling. At lower voltages, less energy is consumed per operation however, each operation takes longer to complete. This reduction in speed can be compensated for by increasing the number of parallel operations performed by the battery-operated devices. In other words, the hardware must be designed to operate faster than the target performance, namely, the target frame rate and resolution, at nominal voltage, such that at lower voltage the performance of the hardware would reach the target performance.
Accordingly, parallelism plays a key role in achieving both real-time and low power video playback. With the increasing frame rate and resolution required for future video coding applications, the need for parallelism in the video is ever more important. The amount of parallelism that can be used is limited by the video coding standard, or algorithm used by the hardware. Certain dependencies within the video coding standard make it difficult to perform operations in parallel. As an example, the entropy coding engine called. Context-based Adaptive Binary Arithmetic Coding (CABAC) has been identified as a key bottleneck in H.264/AVC video decoders. Parallelism is difficult to achieve with the existing H.264/AVC CABAC due to its inherent serial nature and strong data dependencies, specifically, the H.264/AVC CABAC is of a recursive nature. Consequently, it is difficult to parallelize without sacrificing coding efficiency, power, delay, and area—all of which are important to video encoding/decoding. For instance, within the H.264/AVC standard, a frame can be broken up into multiple independent H.264/AVC slices to enable parallel processing in the CABAC, but this comes at a cost of significant reduction in the coding efficiency, namely, poorer compression, since redundancy cannot be eliminated between the slices.
Increased throughput of a CABAC decoding engine is desirable. Unfortunately, data is decoded by the CABAC decoding engine in a serial manner, which is performed one binary symbol (bin) at a time. It is desirable to increase the number of bins processed per second, or every cycle. As an example, the throughput of a H.264/AVC CABAC decoding engine is measured by the number of bins it can decode per second (bins/sec). Throughput requirement for video decoding can exceed 2 Gbins/sec.
CABAC is a form of entropy coding that is executed by a processor. Entropy coding involves compressing data based on the probability of its occurrence. A simple example is wanting to assign short codewords (fewer bits) to elements that occur frequently and longer codewords (more bits) to elements that occur less frequently. In the case of video coding, CABAC is used to compress syntax elements, such as, for example, motion vectors, macroblock types, coefficients, and significance maps. Macroblocks are 16×16 blocks of pixels. Syntax elements are used to describe properties of a macroblock. Syntax elements are also composed of bins, which are processed by the CABAC encoding/decoding engine. Bins dictate the workload of the CABAC encoding/decoding engine. Consequently, speed/throughput is stated in bins/sec. Referring to a CABAC encoder, the CABAC encoder processes data as follows: Input: Syntax Elements (bins)→Output: encoded bits. Alternatively, referring to a CABAC decoder, the CABAC decoder processes the data as follows: Input: encoded bits→Output: decoded bins (also referred to as syntax elements).
There have been several proposals for the next generation video coding standard that present various ways to increase the throughput of the CABAC engine. Certain contributions have looked at various ways of using slices to increase parallel processing for CABAC. Unfortunately, methods provided by such contributions come at the cost of coding efficiency penalty when compared to H.264/AVC, having a single slice per frame, and do not address hardware implementation complexities. This coding efficiency penalty of the slice approach can be attributed to three key sources: 1) reduced context training; 2) no context selection across slices; and 3) start code and header for each slice. Another critical drawback of these approaches is that the entire CABAC engine needs to be replicated, which significantly increases area costs.
Thus, a heretofore unaddressed need exists in the industry to address the aforementioned deficiencies and inadequacies.