1. Field of the Invention
The present invention relates to a cache memory and to a method of operating such a cache memory.
2. Summary of the Prior Art
Current computer architectures rely heavily on the use of cache memory (hereinafter xe2x80x9ccachexe2x80x9d). Integrated with the processor on a single large chip, caches enable the processor to operate at high speed, as most instructions and data can be rapidly accessed from the caches instead of from the main memory which is usually at least ten times slower. On-chip caches have grown steadily in size over the last decade, and now represent a significant proportion of the cost and power consumption of the processor chip. It should be noted that the cache memory is inevitably of smaller memory space than the main memory, but provides more rapid access.
Although it is normally the case that large caches offer better performance than small ones, it is also clear that the performance is not directly related to the size of the cache. Typically, program performance will increase as the cache size increases up to a certain point at which further increases in cache size will have little or no effect. Cache management hardware takes no account of the characteristics of specific programs, and in many simple cases performs very inefficiently. Another common problem is interference, which arises when a program accesses a collection of data objects which compete for parts of the cache. Current approaches to these problems have relied on the use of more complex cache architectures and on increasing cache sizes, with a corresponding increase in system cost, size and power consumption.
At its most general, the present invention proposed that a chache memory has a logical organization in which its memory space is divided into sub-sections (hereinafter xe2x80x9cpartionsxe2x80x9d) under the control of a programmer or compiler. The size of the sub-sections need not be fixed, but may be determined by the control oporation.
This permits data objects to be allocated to particular partitions of the cache. This partitioning of the cache improves the performance of the cache such that a small cache memory can provide the same performance as a conventional cache memory many times larger. This is useful because small caches are faster, take less chip space and less power.
In addition, by minimizing or eliminating interference, the performance of the cache and hence the program can be made more predictable.
In a first alternative, the partition to or from which data items are transferred is controlled by a parameter within an instruction such as a load or store instruction. The parameter may be different for different commands so that data items for different commands made are of different partitions.
Thus, in a first aspect, the present invention may provide a method of operating a cache memory, using commands which cause a transfer, of corresponding items of data between the cache memory and a main memory, which commands have an instruction component and an address component, the method comprising:
defining a plurality of sub-sections within the memory space of the cache memory, each of which has an associated identifier, the sizes of the sub-sections being selectable from a range of sizes during the operation of the cache memory;
extracting from the instruction component of a command a parameter corresponding toga selected one of the identifiers, the corresponding parameter being different for different commands; and
transferring items of data corresponding to said command between the main memory and the sub-section of the memory space of the cache memory for which the associated identifier corresponds to the parameter of said command.
However, if registers are associated with instructions such as load or store, with a specific instruction being a corresponding register, then the parameter which determines the partition to or from which data items are transferred may be determined by such registers themselves.
Thus, in a second aspect, the present invention may provide a method of operating a cache memory, using commands which causes a transfer, of corresponding items of data between the cache memory and a main memory, at least some of which commands each have a corresponding register connected to a communication bus for use by said commands, the corresponding register being different for different commands, the method comprising:
defining a plurality of sub-sections within the memory space of the cache memory, each of which has an associated identifier, the sizes of the sub-sections being selectable from a range of sizes during the operation of the cache memory;
associating a parameter with each said corresponding register, each said parameter corresponding to a selected one of the identifiers; and
transferring items of data corresponding to said command between the main memory and the sub-section of the memory space of the cache memory for which the associated identifier corresponds to the parameter of the register corresponding to said command.
Another possibility for allocating data objects to particular partitions of the cache arises when a DMA controller is being used. Such a DMA controller generates specific commands for the memory access controlled by the DMA controller. Since the DMA controller generates those commands, it may also control the partition to or from which data items associated with those commands are transferred. Thus, in this case, the parameter which identifies the appropriate partition is not derived from the instruction, or a register associated with the instruction, but instead the command and its associated parameter are generated by a common trigger from the DMA controller.
Thus, in a third aspect, the present invention may provide a method of operating a cache memory under control of a DMA controller, the DMA controller being arranged to generate predetermined commands, the method comprising:
defining a plurality of sub-sections within the memory space of the cache memory, each of which has an associated identifier;
generating, at said DMA controller, one of said predetermined commands and a parameter associated with said one of said predetermined commands and with a selected one of the identifiers; and
transferring items of data corresponding to said one of said predetermined commands between a main memory and the sub-section of the memory space of the cache memory for which the associated identifier corresponds to the parameter of said one of said predetermined commands.
Preferably, the programmer or compiler is able to control the size of each partition. That permits analysis of the pattern of access to the cache, and division of the cache into suitably sized partitions, along with the derivation of an appropriate mapping function to map memory addresses to addresses of lines in the partition. Once that has happened, the mapping function should be able to map items in a data structure which marked specification paragraphs are accessed in sequence onto different lines within a partition which the compiler uses for that structure. The aim is then to minimize data collisions for a given partition size. To do this, it is preferable to derive from the program a quantity hereafter referred to as a xe2x80x9cstridexe2x80x9d, the value of which defines the separation of the addresses within the address space of the main memory of successive accesses to or from the memory. Based on the stride, a mapping function can be selected that generates addresses which cover all addresses within a cache partition in an efficient way.
Thus, in a fourth aspect, the present invention may provide a method of operating a cache memory, comprising:
defining a plurality of sub-sections within the memory space of the cache memory; and
transferring data items associated with each other only to a corresponding sub-section of the memory space of the cache memory;
wherein each sub-section has a stride associated therewith, the stride representing the separation within the memory addresses of a main memory of successive transfers of data between the corresponding sub-section and the main memory.
In each of the above four aspects, the present invention may also provide a memory system arranged to operate as discussed above.
It should be noted that although such control of the partitioning is preferable, a general purpose function may be needed to perform mapping, if e.g the pattern of access to the data is not known to the compiler. Preferably, the compiler controls the partitioning of the cache memory using a parameter added to a load and store instruction. That partition parameter may be derived from the instruction opcode or from one or more registers. Each of these registers may be a general purpose register, or may be one or more dedicated partition registers. Registers that are used to implicitly access memory, eg via stack pointer or program counter, normally will have a dedicated partition register associated with them.
It is usually desirable that there are functions which identify the line of cache memory from the memory address, and in this case it is preferable that each partition has its own function. The function may for example be a shift and modulo operation.
As has been mentioned above, the stride defines the separation of successive accesses to the memory, for each partition. It should be noted that multiple partitions may be used to cache accesses with different strides to the same data object.
With the present invention it is possible for multiple DMA controllers and a processor to use a common cache, by providing a dedicated partition register in each controller so that the different controllers and the processor all access different partitions.