Video data requires a lot of storage space to store or a wide bandwidth to transmit. Along with the growing high resolution and higher frame rates, the storage or transmission bandwidth requirements would be formidable if the video data is stored or transmitted in an uncompressed form. Therefore, video data is often stored or transmitted in a compressed format using video coding techniques. The coding efficiency has been substantially improved using newer video coding standard such as H.264/AVC and the emerging HEVC (High Efficiency Video Coding) standard. In order to maintain manageable complexity, an image is often divided into blocks, such as macroblock (MB) or LCU/CU to apply video coding. Video coding standards usually adopt adaptive Inter/Intra prediction on a block basis.
FIG. 1 illustrates an exemplary system block diagram for video decoder 100 to support HEVC video standard. High-Efficiency Video Coding (HEVC) is a new international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC). HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed coding unit (CU), is a 2N×2N square block. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Once the splitting of CU hierarchical tree is done, each CU is further split into one or more prediction units (PUs) according to prediction type and PU partition. Each CU or the residual of each CU is divided into a tree of transform units (TUs) to apply two-dimensional (2D) transforms.
In FIG. 1, the input video bitstream is first processed by variable length decoder (VLD) 110 to perform variable-length decoding and syntax parsing. The parsed syntax may correspond to Inter/Intra residue signal (the upper output path from VLD 110) or motion information (the lower output path from VLD 110). Among the entropy coded bitstream, some bins may be coded by arithmetic coding. The arithmetic coded bins will need an arithmetic decoder to recover the coded data. As shown in FIG. 1, an arithmetic decoding engine 132 is used as part of the entropy decoding engine 110. Furthermore, the operations for arithmetic decoding are usually more complicated than other types of entropy decoding, such as variable length coding. Therefore, the arithmetic decoding may be relatively slow compared to other decoding process and becomes a throughput bottleneck. The residue signal usually is transform coded. Accordingly, the coded residue signal is processed by inverse scan (IS)/inverse quantization (IQ) block 112, and inverse transform (IT) block 114. The output from inverse transform (IT) block 114 corresponds to reconstructed residue signal. The reconstructed residue signal is added to reconstruction block 116 along with Intra prediction from Intra prediction block 118 for an Intra-coded block or Inter prediction from motion compensation block 120 for an Inter-coded block. Inter/Intra selection block 122 selects Intra prediction or Inter prediction for reconstructing the video signal depending on whether the block is Inter or Intra coded. For motion compensation, the process will access one or more reference blocks stored in decoded picture buffer or reference picture buffer 124 and motion vector information determined by motion vector (MV) generation block 126. In order to improve visual quality, deblocking filter 128 and Sample Adaptive Offset (SAO) filter (130) are used to process reconstructed video before it is stored in the decoded picture buffer 124. For the H.264/AVC standard, only the deblocking filter (DF) is used without the sample adaptive offset (SAO) filter.
FIG. 2 illustrates a typical electronic system with built-in audio/video decoder, such as a TV. As shown in FIG. 2, the system uses a CPU bus and DRAM (dynamic random access memory) bus, where the CPU bus is used for CPU command and communication in order to control other modules. The external memory storage (210) is used to store reference pictures for video decoding, decoded pictures for display and other data. The external memory often uses DRAM (dynamic random access memory) and external memory access engine (220) is used to connect the external memory storage to the data bus. The system may include a CPU (230), a video decoder (240), an audio engine (250) and a display engine (260). The video decoder 240 will perform the task of video decoding for compressed video data. The audio engine 250 will perform the task of audio decoding for compressed audio data. The audio engine 250 may also support other audio tasks such as generating audio prompt for user interface. The display engine 260 is responsible for processing video display and generating on-screen display information. For example, the display engine 260 may generate graphic or text information for user interface. The display engine is also responsible for scaling and combining two decoded video data for main window and sub-window display, or split screen display. The CPU 230 may be used to initialize the system, control other sub-systems, or provide user interface for the electronic system.
While arithmetic coding is high-efficiency entropy-coding tool and has been widely used in advanced video coding systems, the operations are highly data dependent. FIG. 3 illustrates an exemplary block diagram of the context-based adaptive binary arithmetic coding (CABAC) process. Since the arithmetic coder in the CABAC engine can only encode the binary symbol values, the CABAC process needs to convert the values of the syntax elements into a binary string using a binarizer (310). The conversion process is commonly referred to as binarization. During the coding process, the probability models are gradually built up from the coded symbols for the different contexts. The context modeler (320) serves the modelling purpose and the model is updated using decoded output data. Accordingly, a path 335 from the output of regular coding engine (330) to the context modeler (320) is provided. During normal context based coding, the regular coding engine (330) is used, which corresponds to a binary arithmetic coder. The selection of the modeling context for coding the next binary symbol can be determined by the coded information. Symbols can also be encoded without the context modeling stage and assume an equal probability distribution, commonly referred to as the bypass mode, for reduced complexity. For the bypassed symbols, a bypass coding engine (340) may be used. As shown in FIG. 3, switches (S1, S2 and S3) are used to direct the data flow between the regular CABA mode and the bypass mode. When the regular CABAC mode is selected, the switches are flipped to the upper contacts. When the bypass mode is selected, the switches are flipped to the lower contacts.
For arithmetic coding, the context formation and context update is highly data dependent. The context model may involve multiple neighboring reconstructed samples. Furthermore, the context update has to wait till the involved samples are available. The arithmetic decoder may become the throughput bottleneck in the decoding process. Therefore, it is desirable to develop high-throughput arithmetic decoder.