For a conventional H.264 video processor, the CABAC (or CAVLC) stage often creates a bottleneck for the encode or decode process. In a conventional H.264 codec, the entropy coder is connected directly to the transform function circuit. Since the transform function circuit represents a data path that may operate on multiple pixels or coefficients in parallel, performance can be scaled up by operating on a number of pixels in parallel. Performance for the entropy coder is harder to scale up because of the difficulty to parallelize operations across symbols (i.e., only one symbol can be decoded at a time). In addition, the complexity of CABAC and CAVLC encoding or decoding uses multiple clock cycles to process one symbol. CABAC in particular takes more than 2 clock cycles per symbol on average to process. The extra clock cycles place an upper bound on the maximum bit rate that can be practically supported in a given process technology which is often less than the desired amount. Although CABAC encoding/decoding provides more efficient compression than CAVLC encoding/decoding, CABAC encoding/decoding is slower than CAVLC encoding/decoding due to the complexity.
It would be desirable to resolve encoding and/or decoding bottlenecks to achieve a high performance system.