Video encoding and decoding involves several processes that can be parallelized and be effectively implemented on multi-processors and a graphics processing unit (GPU). Motion estimation and Transform/Scaling are examples of processes that have no dependencies on neighboring blocks, making it highly suitable for implementing on the GPU. Other processes such as intra-prediction and deblocking, however, have dependencies on neighboring blocks.
For H.264 intra-prediction, there are nine luma prediction modes for the 4×4 block size and four modes for 16×16 block size. Each 4×4 block has 16 luma or chroma samples and each 16×16 block has 256 luma samples. A luma sample represents the monochrome signal and a chroma sample represents one of the two color difference signals related to the primary colors. For the 4×4 luma intra-prediction, depending on the mode, a macroblock may predict from the following neighbors as shown in FIG. 1A:                Above-Left Macroblock        Above Macroblock        Above-Right Macroblock        Left Macroblock        
Having dependencies on neighbors restricts the number of macroblocks that can be processed in parallel. It also necessitates synchronization points that ensure that macroblocks are not processed before their neighbors are ready. The number of macroblocks that are processed between the synchronization steps is not constant, and it takes several synchronization steps before peak parallelism is reached in terms of macroblock processing rate. It is difficult to parallelize intra-prediction for a decoder. In particular, efficient parallelization is not currently possible if there is only one slice per frame, which may happen if the encoder does not create bitstreams tailored to maximize the performance of the decoder. These issues are not eliminated even when decoding multiple slices with a GPU. Due to these issues, processes such as intra-prediction cannot make full use of a GPU's processing capability. It is within this context that aspects of the present disclosure arise.