Typically, digital signal processor (DSP) processing algorithms divide data into rectangular regions or blocks in order to minimize local memory and create a well-defined, repetitive computational procedure. Examples of such a procedure include macroblocks in video compression systems, such as moving picture experts group (MPEG) systems (e.g., MPEG-1, MPEG-2, and MPEG-4), advanced video coding (AVC) systems, MICROSOFT WINDOWS media systems, and the like. Additional examples of such a procedure include blocks or code blocks in image compression systems, such as joint picture experts group (JPEG) and JPEG2000 systems.
Computation in such block-based systems is done with the blocks being the standard objects of information. Since the procedures performed on the blocks are repetitive in nature, a hardware pipeline is typically created to perform tasks on the data, one block at a time. The data buses connecting the processing elements are designed to transfer words of data. However, a given block of data may include multiple data words. As such, in order to transfer a block of data from a producer processing element to a consumer processing element, one or both of the processing elements must be involved in both their respective processing operations and communication operations for communicating the data. Since the processing elements must be involved in communication of data, complexity of their control logic is increased. Accordingly, there exists a need in the art for a method and apparatus for communication between processing elements capable of increasing the granularity of communication to accommodate blocks of data.