A conventional computer architecture typically includes a Central Processing Unit (CPU) connected to a main memory system and various Input/Output (I/O) facilities. Data, including instructions, moves between the main memory and the CPU. The overall bandwidth between main memory and the CPU is limited by a number of factors including memory access time, word size, bus width, bus speed, etc. In most cases, particularly those cases involving a large amount of data (e.g., scientific computer applications, image/video processing, etc.) the memory bandwidth between the CPU and main memory creates a bottleneck and may reduce overall system performance to an unacceptable or undesirable level.
To reduce potential bottlenecks between main memory and the CPU, computer designers may use a multi-level memory hierarchy, such as a memory hierarchy that includes the provision and use of small numbers of registers internal to the CPU, external main memory and one or more levels of intermediate cache memory. Cache memory decreases the time needed by a CPU to access data, thereby minimizing data latency. As a result, cache memory improves performance of the executing application.
Cache memory is generally not directly accessible or manipulable by the program. That is, cache memory operation is generally not part of the application program interface but is instead controlled by a dedicated controller or similar hardware. In general, cache memory is designed to be transparent to the CPU, and is treated as fast main memory. A compiler that processes source code to provide object code programs and the program performing the “load” and “store” commands into main memory use and benefit from the cache memory without having any direct program control over its use. At runtime, hardware may automatically transfer data and/or instructions that are associated with a range of addresses in main memory to the cache. When the CPU makes a subsequent request for data from these addresses, the data is provided from the cache.
Systems that require high performance hardware, such as digital signal processors and custom computer engines, typically require high memory bandwidth to support high-speed computations. The standard cache arrangement as explained above may not provide enough memory bandwidth for these types of systems and, even though cache may be used to supplement main memory, bottlenecks may still occur. To provide additional local memory bandwidth so that bottlenecks do not or are less likely to occur, in addition to cache memory, computer designers may include small, fast local memory called Scratch Pad Memory (SPM) in their computer design. SPM is local memory that is typically connected directly to the CPU.
SPM is available for use by a programmer through an application program interface. For example, a program's “load” and “store” instructions may explicitly address either a main memory area or a particular local SPM area. The program explicitly places a given data item in one of these separate memory spaces. An SPM may provide data directly to the CPU without using main memory or system bus bandwidth or space within cache memory. SPM, therefore, further improves system performance and decreases the time required for a CPU to access data. As a result, the SPM improves performance of the executing application.
Currently, data is relocated to SPM by a computer programmer and not inserted automatically by the compiler at program compile time. It is therefore desirable to automatically configure SPM usage and make the most efficient use of SPM.