Certain multiprocessor systems include multiple different fixed-function and programmable processing units. Fixed-function processing units are typically configured to concurrently execute one or more commands that specify common predefined operations, while programmable processing units are typically configured to concurrently execute one or more commands that specify complex multi-threaded programs. The degree of concurrency is a fixed design feature and defines a native width of a given processing unit. Commands and associated data are conventionally transmitted through a command queue linking a source processing unit to a target (destination) processing unit. The command queue is conventionally configured to accept commands and related data according to a fixed native width of the source processing unit and deliver the commands and related data according to a fixed native width of the target processing unit, thereby constraining overall programmability of the multiprocessor system to fixed, predefined connections between specific processing units.
Furthermore, a given command may include a relatively large amount of data, requiring conventional systems to load and traverse associated data to find and evaluate a new command within the command queue. Such traversal can cause data caches to operate inefficiently, thereby reducing system efficiency and performance. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.