Cell processors are a type of microprocessor that utilizes parallel processing. The basic configuration of a cell processor includes a “Power Processor Element” (“PPE”) (sometimes called “Processing Element”, or “PE”), and multiple “Synergistic Processing Elements” (“SPE”). The PPEs and SPEs are linked together by an internal high speed bus dubbed “Element Interconnect Bus” (“EIB”). Cell processors are designed to be scalable for use in applications ranging from the hand held devices to main frame computers.
A typical cell processor has one PPE and up to 8 SPE. Each SPE is typically a single chip or part of a single chip containing a main processor and a co-processor. Each SPE typically includes a synergistic processor unit (SPU) and a local store (LS). The PPE typically includes a power processor element (PPU) and one or more caches. All of the SPEs and the PPE can access a main memory, e.g., via the bus. The SPEs can perform parallel processing of operations in conjunction with a program running on the PPE. To coordinate processes executing in parallel on the SPE and PPE, atomic operations are often implemented. An atomic operation is one in which an SPU or PPU can read or write to a memory address (often referred to as an atomic) in a single operation while denying other processors access to the atomic. Atomic operations can be mutual exclusion (muxtex) “locked” operations or “lock-free” operations. In a mutex operation, a processor locks the atomic and prevents other processors from writing to it until it is unlocked. In a “lock-free” atomic operation, only one processor can write to the atomic address at a time, but other processors can write over what has been atomically written. Lock-free atomic operations utilize “reservation” operations that notify a processor making the reservation whether an atomic has been overwritten since the reservation was made.
A very common, well understood synchronization primitive used in conjunction with cell processors is known as a “compare and swap” operation. The basic idea of such an operation is to modify a value stored in memory if no other processing element has already done so. The compare and swap operation compares the stored against a specified value. If the values match, the value in memory is updated. If they do not match, the application is notified of a failure. As an example, a compare and swap operation may compare the value stored at memory location 0x7A against the value ‘10’. If they match, write the value ‘20’ to memory location 0x7A.
It is often desirable to perform compare and swap operations atomically. A problem with atomic operations on cell processors is that the PPU and SPU have different reservations sizes for atomic operations. These different atomic operation sizes are a result of different sized memory access capabilities of the PPU and SPU. The PPU's memory access, is generally limited by the register size of the PPU core. The PPU register length is 64 bits, 8 bytes of 8 bits each. A memory flow controller (MFC) sets the SPU atomic size. The MFC handles direct memory access (DMA) operations for both atomic and non-atomic operations for the SPU. The SPU local store is in the form of 16 byte, 128 bit registers. The SPU local store registers are not tied to any main memory address. The SPU communicates with memory though the MFC, which operates on 128 byte chunks. In certain cell implementations, all atomic operations on the SPU are 128 bytes. However, non-atomic operations handled by the MFC can range in size from 1 byte to 16 kilobytes. Thus, SPUs perform read with reservation and copy 128 bytes into their local stores. The reservation granule can be any size. It will logically work correctly as long as it is larger than the atomic access size.
It is very powerful that the SPU can work atomically on large chunks of data and it can be quite crippling that the PPU can only work atomically on 8 bytes at a time. Such different sizes for atomic reservations can limit the features of a lock-free algorithm. Linked lists without a mutex can be done using lock-free algorithms. However, if larger atomics are available, one can apply lock-free algorithms to more complex operations since more than one integer may be atomically modified at a time. It would be advantageous if the PPU had access to some mechanism to operate atomically on values larger than 8 bytes. Such a feature could facilitate more advanced programming models for SPU utilization.
Thus, there is a need in the art, for a way to perform atomic compare and swap operations with a cell processor where the PPE and SPE have different-sized register lines.