The present invention relates to accessing data via a memory cache, wherein a plurality of buffer addresses are defined in response to processing requirements.
As computer system design continues to develop, design criteria are dictated by the cost and efficiency of the components that are available. Processor speeds continue to increase at a greater rate than memory access speeds, and so the problem of speed mismatch between the memory and the processor is becoming more of an issue as time progresses. Many design strategies have been developed in order to deal with the mismatch in, and these arts are becoming increasingly diverse, as every possible way of reducing this mismatch is employed. In recent years, the design of memory chips has included the introduction of numerous variations of the basic static and dynamic memory types, including synchronous static RAM and synchronous dynamic RAM etc.
A common technique for reducing the time of memory access is the use of a memory cache. At its simplest, a cache contains a copy of data from memory locations that have been recently used. Because the cache has a relatively small capacity, it can be constructed from memory chips that have the fast access times required to keep up with the processor. A typical arrangement is the use of a primary cache and a secondary cache. The primary cache may exist on the same area of silicon as the processor, thus enabling a large number of short electrical connections to be made and, thereby improving the speed of data transfer. Having a primary cache on the same chip as the processor also enables various cache operations to be performed in a highly optimised manner, for example, by using information from an out-of-sequence execution controller to improve cache efficiency.
An on-chip cache is expensive to implement, and therefore is limited in size. A much larger secondary cache can be implemented off-chip, perhaps with the aid of control logic supplied directly from the processor, so that only a small number of memory chips are required in order to implement a substantial secondary cache. The secondary cache runs at a slower speed than the primary cache because it is not on the same chip as the processor. In operation, it is hoped that data will be found in the primary on-chip cache. If not, the much larger secondary cache can be addressed with a slight time penalty. Typically, ninety percent of addressing will find data in either the primary cache or the secondary cache. Only if data is not available from either cache does the processor need to access the much slower main memory.
A further known implementation is the provision of separate primary data and instruction caches on the processor chip. This enables data and instructions to be fetched simultaneously, most of the time, even though, outside the processor, no distinction is made between memory used for data and memory used for instructions.
A high reliance on high speed processing circuitry with the use of memory caching, can result in a significant loss of overall processing power under certain circumstances. The efficiency of cache circuits is highly dependent upon the pattern of data access. Practical cache designs suffer from an inherent weakness, in that certain repeated patterns of data access result in extremely long data transfer times. Because such patterns are statistically unlikely to occur, this problem is usually ignored. However, in either safety critical systems, or systems where an extremely high bandwidth must be guaranteed, even a low probability of this problem occurring has a very high cost.
A particular application that relies on the use of high speed processing to perform a sequence of highly patterned data access is that of image processing. When processing live video signals, for example, it is necessary to provide a guaranteed frame-by-frame output at a particular rate. Many image processing algorithms are one-pass algorithms. Audio processing also makes use of one pass algorithms. In image processing, these types of algorithm take a single pixel from a first image, process it, possibly with a corresponding pixel from other images, and generate an output pixel. This pattern of access is repeated many hundreds of thousands of times for a single image frame. The pattern of addressing results from the relative locations in memory of the buffers used for input and output image frames. Under these conditions, the rare occasions when pattern-dependent cache addressing problems occur are usually disastrous, perhaps resulting in a slowdown of fifty times or more. Even though this type of slow down may be extremely unlikely, the slight chance of it occurring, at random, renders this type of system unsuitable for a critical live broadcast environment.
It is an object of the present invention to provide an improved solution to addressing data via a cache.
According to a first aspect of the present invention, there is provided a processing apparatus comprising processing means, main memory means and caching means, wherein an application processing executing on said processing apparatus executes instructions on said processing means and accesses data in main memory via said cache, and said processing means is configurable by a configuration process so as to: access locations in main memory with reference to addresses, each comprising virtual and physical address bits; identify selected bits of said physical address bits that select areas of said cache; and identify permutations of said selected bits to define buffer alignments in main memory, in response to an identification of requirements of said application process made by said configuration process.
Preferably, the processing requirements are repeated by a processing graph, including buffer nodes and processing nodes.
According to a second aspect of the present invention, there is provided a method of allocating main memory for buffers, wherein locations in main memory are accessed with reference to addresses, each comprising address bits; selected bits of said address bits identify common address bits in a primary cache in a secondary cache; permutations of said selected bits identify preferred buffer alignments; and said permutations are allocated to buffers in response to an identification of processing requirements.