The present inventive subject matter relates to a cell broadband engine processor, and particularly to a managing data movement in the cell broadband engine processor.
A cell broadband engine (CellBE for short hereinafter) processor is a kind of microprocessor utilizing parallel processing. Generally, basic configuration of a CellBE processor comprises a Power Processing Element (PPE), eight Synergistic Processing Elements (SPEs), a Memory Flow Control (MFC), an Internal Interrupt Control (IIC), and a Main Memory. Computing components of the CellBE processor are PPE and SPEs. Component parts of the CellBE processor are connected via a high-speed bus (“Elements Interconnect Bus” (EIB)). Any two of the eight SPEs may exchange data through a high-speed bus of 25.6 GB therebetween, while only a bus of 25.6 GB totally is between SPEs and the main memory. Bus transfers between SPEs and bus transfers between respective SPEs and the main memory may be in parallel. The CellBE processor is applicable to various applications from handhold devices to main computers.
The CellBE processor presents a change in computer architecture by eight SPEs which may process in parallel, thereby improving computer performance greatly. In order to solve memory wall problems in the CellBE processor, each of SPEs in the Cell BE processor is provided with a specific local storage and may only access its local storage (LS) directly. Introduction of LS can reduce the memory latency. Usually, the size of LS is 256 KB. This size of storage space brings trouble for developers due to its limitation for a programs' binary size.
An existing CellBE processor incorporates a specific physical cache for a SPE. Although the computing performance is improved, the architecture of the CellBE processor is more complex. This results in an increase of cost. Another approach is soft-cache. This approach uses part of LS as a soft-cache. But using soft-cache decreases available space of the LS. Once the size of a program is becomes relatively large, the soft-cache will not be usable.
Due to the limited size of LS, most persistent data must be put into main memory managed by PPE. Processing in PPE is probably switched out by OS, thus increasing communication overhead between PPE and SPE. In addition, data in the main memory may be swapped out into hard disk swap partitions, which increases latency. Also, irregular data movement can cause a cache inconsistency problem, such as cache false sharing.