One well-known approach to improving the performance of computers is the use of a cache memory. Although a conventional random access memory (RAM) used for storage of instructions and data operates at high speed, the access time is slow in comparison with the operating speeds of computer execution units. Thus, memory access time may be the limiting factor in increasing the operating speed of a computer. By utilizing a cache memory, the limitations imposed by memory access time may be at least partially overcome.
A cache memory is connected between the execution unit and the main memory. The cache memory typically has a relatively small capacity and a faster access time than the main memory. When the execution unit is executing a program, it accesses data in the cache memory, thereby taking advantage of the speed of the cache memory. When the data is not present in the cache memory, the data is read from the main memory and is placed in the cache memory for subsequent use. When a significant percentage of accessed data is present in the cache memory, the operating speed of the computer is increased.
Conventional cache memories take advantage of a characteristic of the execution of many computer programs known as temporal locality. When program execution has temporal locality, the same data is used more than once, typically multiple times, in a relatively short period. This may occur, for example, in a program loop that is executed multiple times. When data having temporal locality is present in the cache memory, performance is enhanced. By contrast, a cache memory provides little or no speed improvement with respect to data having a small amount of temporal locality, and a cache memory provides no speed improvement with respect to data that is used only once during program execution.
In some applications, the execution unit may be required to perform operations in sequence on large volumes of data, as may be the case in processing pixel data representative of a large image. For example, a color image may be represented by 140 megabytes of data. The data may be stored in a burst memory that can deliver large quantities of data at high speed. However, the data must be buffered for use by the execution unit, because the burst memory does not supply data at the same rate that it is processed by the execution unit. The contents of a large data structure are consecutively supplied to the streaming buffer as streaming data until the entire data structure has been processed.
One known approach to buffering of streaming data is to pass the streaming data through the data cache memory of the computer, with the streaming data having access to the entire data cache. The streaming data is characterized by a high degree of spatial locality. Spatial locality in this context refers to address locality. The data in the streaming buffer typically consists of consecutive words from main memory (hence spatial locality). The streaming buffer has little or no temporal locality because of the size of the data structure and because the data may be processed only once. Large data sets having little or no temporal locality swamp the data cache memory and replace normal data. Thus, program code will be unable to use the data cache memory, and performance is likely to be degraded.
Another known approach is to use a data cache memory for normal data having temporal locality and a separate streaming buffer for streaming data. This approach has the disadvantage that separate data paths and addressing and translation circuitry are required for the data cache memory and the streaming buffer. In addition, two types of instructions are required for loading data, thereby increasing programming and/or compiler complexity.
All of the known techniques for buffering of streaming data have one or more disadvantages, including significant additional circuitry, degradation of performance under certain conditions and increased complexity in compilers and programming.