Heterogeneous multi-core processors are types of processors that utilize parallel processing within the same chip package. The basic configuration of a heterogeneous multi-core processor includes a general processing element, and multiple specialized processing elements. The general processing elements and specialized processing elements are linked together by an internal high-speed bus. Heterogeneous multi-core processors are designed to be scalable for use in applications ranging from hand held devices to high-end game consoles and mainframe computers, as well as input-output (I/O) devices.
An example of a heterogeneous multi-core processor is a Cell Broadband Engine™ (a trademark of Sony Computer Entertainment, Inc.), which includes one general processing element referred to as a “power processing element” (PPE) and up to eight specialized processing elements referred to as “synergistic processing elements” (SPEs) on a single die. Each SPE typically has a main processing unit referred to as a synergistic processing unit (SPU) and a direct memory access controller in a memory flow controller (MFC). The SPEs can perform parallel processing of operations in conjunction with a program running on the PPE. Furthermore, the SPEs and the PPE can access a system memory, a shared memory for the heterogeneous multi-core processor, and local memories for other SPEs.
The SPEs have small local memories (typically about 256 kilobytes), which are referred to as local stores. Code segments of SPE programs, which execute on the SPEs, manage the content of the local stores. In particular, the code segments executing on the SPEs transfer code and data to/from the local stores via the memory flow controllers. The SPEs depend upon the availability of code and data in the local stores for performance.
To maintain a steady flow of code and data for the SPEs, the SPEs typically implement multiple buffers for code, input data, and output or processed data. More specifically, many SPEs implement two or more buffers so input data can be read into the local stores while output data is being processed and written to other memory from the local stores. SPEs execute code to coordinate reading data from and writing data to the buffers of the local stores. Typical code executed to coordinate reading from and writing to the buffers currently incorporate more coded delays than necessary or implement overly burdensome command tag identification structures for commands to track the progress of the commands. Unfortunately, these characteristics unnecessarily reduce the rate at which data can be processed by the SPEs.