Cell processors are a type of microprocessor that utilizes parallel processing. The basic configuration of a cell processor includes a “Power Processor Element” (“PPE”) (sometimes called “Processing Element”, or “PE”), and multiple “Synergistic Processing Elements” (“SPE”). The PPEs and SPEs are linked together by an internal high speed bus dubbed “Element Interconnect Bus” (“EIB”). Cell processors are designed to be scalable for use in applications ranging from the hand held devices to main frame computers. Cell processors may manage multiple tasks using a task management system based on a software concept referred to as “threads”. A “thread” generally refers to a part of a program that can execute independently of other parts. Operating systems that support multithreading enable programmers to design programs whose threaded parts can execute concurrently.
A typical cell processor has one PPE and up to 8 SPE. Each SPE is typically a single chip or part of a single chip containing a main processor and a co-processor. Each SPE typically includes a synergistic processor unit (SPU) and a local store (LS). The PPE typically includes a power processor element (PPU) and one or more caches. All of the SPEs and the PPE can access a main memory, e.g., via the bus. The SPEs can perform parallel processing of operations in conjunction with a program running on the PPE. To coordinate processes executing in parallel on the SPE and PPE, atomic operations are often implemented. An atomic operation is one in which an SPU or PPU can read or write to a memory address (often referred to as an atomic) in a single operation while denying other processors access to the atomic. Atomic operations can be mutual exclusion (muxtex) “locked” operations or “lock-free” operations. In a mutex operation, a processor locks the atomic and prevents other processors from writing to it until it is unlocked. In a “lock-free” atomic operation, only one processor can write to the atomic address at a time, but other processors can write over what has been atomically written. Lock-free atomic operations utilize “reservation” operations that notify a processor making the reservation whether an atomic has been overwritten since the reservation was made.
A problem with atomic operations on prior art cell processors is that the PPU and SPU have different reservations sizes for atomic operations. These different atomic operation sizes are a result of different sized memory access capabilities of the PPU and SPU. The PPU's memory access is generally limited by the register size of the PPU core. The cell processor architecture does not define how large the atomic operation size is for the SPU. However, the SPU can access the main memory through a memory flow controller (MFC), which can transfer data in increments much larger than the register size of the PPU core. For example, in certain types of cell processors, the MFC for an SPU can transfer data into and out of main memory in 128 byte chunks (or smaller) but the PPU can transfer data in only 8 byte chunks (or smaller). The maximum PPU memory transfer size for a single operation is determined by the size of the PPU register set. The PPU register length is 64 bits, 8 bytes of 8 bits each. The MFC sets the SPU atomic size. The SPU local store is in the form of 16 byte, 128 bit registers. The SPU local store registers are not tied to any main memory address. The SPU communicates with memory through the MFC, which operates on 128 byte chunks. The MFC handles direct memory access (DMA) operations for both atomic and non-atomic operations for the SPU. In certain cell implementations, all atomic operations on the SPU are 128 bytes. However, non-atomic operations are also handled by the MFC and can range in size from 1 byte to 16 kilobytes. Thus, SPUs perform read with reservation and copy 128 bytes into their local stores. The reservation granule can be any size. It will logically work correctly as long as it is larger than the atomic access size.
It is very powerful that the SPU can work atomically on large chunks of data and it can potentially be quite crippling that the PPU can only work atomically on 8 bytes at a time. Such different sizes for atomic reservations can limit the features of a lock-free algorithm. Linked lists without mutex can be done using lock-free algorithms. However, if larger atomics are available, one can apply lock-free algorithms to more complex operations since more than one integer may be atomically modified at a time.
Thus, there is a need in the art, for a way to perform atomic operations with a cell processor where the PPE and SPE have different-sized register lines.