A basic flow of video decoding using graphics processing unit (GPU) acceleration involves (i) decoding syntax, (ii) transferring data to the GPU, (iii) assigning GPU blocks, and (iv) running the GPU blocks. Syntax decoding produces uncompressed data (such as transform coefficients, prediction modes, motion vectors, etc.) for each picture making up a video sequence. Syntax decoding can be done on a central processing unit (CPU) or some other serial processor. The uncompressed data is transferred to the GPU. GPU blocks are assigned to decode specific portions of a picture and the GPU blocks are run to decode the picture. The GPU runs multiple blocks at once. Each GPU block determines whether to wait for any neighbor pixels to be decoded. When a GPU block has to wait for neighbor pixels to be decoded, the GPU block waits on synchronization primitives to make sure the neighbor is complete before it can decode its portion of the picture.
It would be desirable to implement sweep dependency based graphics processing unit (GPU) block scheduling.