This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present invention that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that the statements in this section are to be read in this light, and not as admissions of prior art.
Various video coding standards, e.g. MPEG-4 Part 10/AVC, apply a spatial based intra prediction algorithm for taking advantage of spatial redundancy within images. The intra prediction is done for various block sizes (e.g. 4×4, 8×8 and 16×16). Taking 4×4 block intra prediction as an example, there are in AVC nine pre-defined spatial predictors and the prediction directions for predicting. One of the biggest issues of intra prediction schemes employed in AVC is its complexity. In order to make a correct prediction of the current block, the blocks that are located on the left and upper side of a current block are reconstructed first, after their own encoding and before serving as predictors. Therefore, image encoding can only be performed sequentially, in left-to-right and up-down directions.
Nowadays, the processing architectures are evolving from high performance sequential processor architecture to parallel processor architectures (e.g. IBM's cell processor, Intel's Larabee processor, and nVidia or AMD's GPUs etc.) The introduction of these processors is changing the way computation is done with computers. The more parallelism an application has, the better the application will perform on the processors. However, in AVC encoding, intra prediction cannot be efficiently be performed using these processors, due to the fact that intra prediction is a natural born sequential processing problem.
Several attempts have been tried to improve the performance of intra prediction in video encoding and decoding, which are targeted at pixel level parallelization. This approach can make prediction for a line of pixels according to the prediction mode. These methods are efficient on DSP or FPGA implementations, but the above-mentioned parallel processor architectures cannot take advantage of those proposed schemes.