The present invention relates generally to systems where multiple processors share a common program memory, and specifically to a system and method for reducing the latency for accessing the memory.
Some processors have separate memories for storing program instructions and program data. These memories are typically referred to as program store and data store, respectively. The access patterns for the program store typically differ from the access patterns for the data store. Program store data is frequently accessed sequentially, as the processor executes one instruction after another. Most instructions do not affect the program address of the next instruction to be executed. Some instructions, such as a branch or a jump, cause the processor to execute an instruction that does not immediately succeed the previously executed instruction in the program store.
However, most instructions are executed sequentially, a concept that is known as the principle of locality. As a result, schemes have been developed to take advantage of this feature for improving processor performance. One such scheme is the introduction of a program store cache. The program store cache stores multiple instructions local to the processor. Typically, the cache comprises memory having faster access time than the program store. However, the improved access time comes at the expense of other design criteria, including cost. As a result, the cache is a typically a fraction the size of the program store. Therefore, the processor can exploit the principle of locality by storing a sequence of instructions in the program store cache. When the processor attempts to access an instruction that is not in the cache, referred to as a cache miss, the cache loads the instruction that the processor is trying to access. However, since the instruction are primarily executed in sequence, cache misses are relatively rare compared to cache hits.
While the solution described above is simple and elegant for single processor devices, the solution becomes complicated for multiprocessor devices. The simplest approach to providing a program store for multiple processors is to provide a single program store for each processor. However, this solution can waste memory, especially when the processors are sharing some of the same program code. By using a single program store for multiple memories, a smaller amount of total memory can be allocated to program store, providing a less expensive solution.
However, one issue that arises when using a single program store for multiple processors is the extra latency that each processor can incur while trying to fetch program instructions. If two or more processors try to access the memory at the same time, one or more processors will need to be held off until the other processor or processors have completed the instruction fetch. Therefore, there is a need for a solution that reduces or eliminates extra latency caused by multiple processors sharing the same program store.