Numerous data processing applications require a relatively small number of unique operations to be repeatedly performed for a large volume of data. For example, in a number of media applications, such as processing of video data, a relatively small number of unique operations are repeatedly performed on many blocks of many frames/pictures of video data.
As integrated circuit technology continues to advance, it is desirable to have media processors that are custom designed for such type of processing. In particular, it is desirable to have media processors designed with multiple data processing blocks equipped to repeatedly perform these relatively small number of operations for the large volume of data, in a cooperative and at least partially parallel manner.
Further, it is desirable for each of the data processing blocks to operate with a high degree of efficiency. Thus, it is also desirable for the data processing blocks to be able to support multi-threading (interleaved execution of multiple threads of instructions), without the typical significant resource overhead required to support context switching (saving and restoring the various thread states as execution switches back and forth between the different threads of instructions).