Most video compression-decompression (codec operations) use block based processing with typically 16×16 pixel macroblocks (MBs). Video coding of a MB generally involves dependency on one or more pixels of adjacent MBs. For example, in H.264 specifications, CABAC context modeling depends on up to two neighboring MBs in the past of the current MB. Usually, the neighboring MBs are to the left and on the top of the current MB. In prediction of intra macroblocks, the encoder may select one of nine prediction modes. To support all these modes up to 37 pixels one from the top left, sixteen from the top, four from the top right and sixteen from the left neighboring MBs are used as prediction samples.
As images are becoming bigger and bigger, for example, High Definition 1080p (1920×1080@60 Hz) with 30 cycle/pixel, a 3.8+ GHz processor will be required just for the video decoder. A processor with such processing power is not yet available.
Given the neighboring MBs constraint, one approach is to decode the current MB faster and faster with bigger and faster dedicated hardware blocks that are designed to keep up with the real time needs. This approach usually leads to a set of dedicated hardware blocks for each of the video compression standards which is big in area and takes a lot of time to design and debug.
The other approach is to try and solve it using an array of processing elements working in parallel on different image MBs. One problem with this approach is that CABAC or entropy coding is a serial process that can't be “parallelized”, i.e. one must finish the current element CABAC decode process before the next one can start.
Another approach uses multiprocessor architecture to map H.264 decoding so that an entire image frame is parsed and entropy decoded and then a number of additional processors are used to execute the transform and other operations that may be necessary such as intra-prediction, motion compensation and loop filtering. One problem with this approach is that a full memory for an entire frame of macroblock entropy decoding is required. See MAPPING OF H.264 DECODING ON A MULTIPROCESSOR ARCHITECTURE, BY van der Tol et al., Proc. Of SPIE Vol. 5022, pgs 707-718.