Computer processors function by processing data elements through various registers in accordance with instructions provided by a computer program. The processor executes instructions in the form of machine language, which are the low-level instructions relating to what data elements are processed through which registers. Most software, however, is written in higher-level programming code, such as C++, which has the advantages of being human readable and of embodying relatively complex processing operations using comparatively short, quickly-written commands. A compiler receives the high-level programming code, and based upon the programming of the compiler itself, generates the machine language that is readable by a processor.
Software cache is a robust solution to locally cache remote data in systems that do not have a hardware cache, such as synergistic processing elements (SPEs) in a cell broadband engine. Using such software caches, a program can load from global address space data it requires on a need basis, which is extremely convenient when either the data access pattern is irregular, the data footprint is larger than the local memories, or a combination of both.
Due to high memory latencies of direct memory access (DMA) requests to get data in and out, performance software caches are significantly increased by increasing the set associativity of software caches. By increasing the associativity of the cache, the software cache subsystem is more resilient to multiple frequently accessed data that hash to the same cache set.
The problem with larger set associativity of software cache is mainly that it increases the amount of work required to be performed by the processor to detect whether an access is a hit or a miss, a task referred to as “cache lookup.” This cost also exists in hardware caches, but the latency of the cache lookup is hidden by using content addressable memory (CAM). When implementing cache in software, typical processors such as the SPE do not have access to the programmable CAM. As a result, testing for a match must be performed explicitly for each of the tags in a cache set.