Digital data processing systems are used in many applications including for example data processing systems, consumer electronics, computers, cars etc. For example, personal computers (PCs) use complex digital processing functionality to provide a platform for a wide variety of user applications.
Digital data processing systems typically comprise input/output functionality, instruction and data memory and one or more data processors, such as a microcontroller, a microprocessor or a digital signal processor.
An important parameter of the performance of a processing system is the memory performance. For optimum performance, it is desired that the memory is large, fast and preferably cheap. Unfortunately these characteristics tend to be conflicting requirements and a suitable trade-off is required when designing a digital system.
In order to improve memory performance of processing systems, complex memory structures which seek to exploit the individual advantages of different types of memory have been developed. In particular, it has become common to use fast cache memory in association with larger, slower and cheaper main memory.
For example, in a PC the memory is organized in a memory hierarchy comprising memory of typically different size and speed. Thus a PC may typically comprise a large, low cost but slow main memory and in addition have one or more cache memory levels comprising relatively small and expensive but fast memory. During operation data from the main memory is dynamically copied into the cache memory to allow fast read cycles. Similarly, data may be written to the cache memory rather than the main memory thereby allowing for fast write cycles.
Thus, the cache memory is dynamically associated with different memory locations of the main memory and it is clear that the interface and interaction between the main memory and the cache memory is critical for acceptable performance. Accordingly significant research into cache operation has been carried out and various methods and algorithms for controlling when data is written to or read from the cache memory rather than the main memory as well as when data is transferred between the cache memory and the main memory have been developed.
Typically, whenever a processor performs a read operation, the cache memory system first checks if the corresponding main memory address is currently associated with the cache. If the cache memory contains a valid data value for the main memory address, this data value is put on the data bus of the system by the cache and the read cycle executes without any wait cycles. However, if the cache memory does not contain a valid data value for the main memory address, a main memory read cycle is executed and the data is retrieved from the main memory. Typically the main memory read cycle includes one or more wait states thereby slowing down the process.
A memory operation where the processor can receive the data from the cache memory is typically referred to as a cache hit and a memory operation where the processor cannot receive the data from the cache memory is typically referred to as a cache miss. Typically, a cache miss does not only result in the processor retrieving data from the main memory but also results in a number of data transfers between the main memory and the cache. For example, if a given address is accessed resulting in a cache miss, the subsequent memory locations may be transferred to the cache memory. As processors frequently access consecutive memory locations, the probability of the cache memory comprising the desired data thereby typically increases.
To improve the hit rate of a cache N-way caches are used in which instructions and/or data is stored in one of N storage blocks (i.e. ‘ways’).
Cache memory systems are typically divided into cache lines which correspond to the resolution of a cache memory. In cache systems known as set-associative cache systems, a number of cache lines are grouped together in different sets wherein each set corresponds to a fixed mapping to the lower data bits of the main memory addresses. The extreme case of each cache line forming a set is known as a direct mapped cache and results in each main memory address being mapped to one specific cache line. The other extreme where all cache lines belong to a single set is known as a fully associative cache and this allows each cache line to be mapped to any main memory location.
In order to keep track of which main memory address (if any) each cache line is associated with, the cache memory system typically comprises a data array which for each cache line holds data indicating the current mapping between that line and the main memory. In particular, the data array typically comprises higher data bits of the associated main memory address. This information is typically known as a tag and the data array is known as a tag-array. Additionally, for larger cache memories a subset of an address (i.e. an index) is used to designate a line position within the cache where the most significant bits of the address (i.e. the tag) is stored along with the data. In a cache in which indexing is used an item with a particular address can be placed only within a set of lines designated by the relevant index.
To allow a processor to read and write data to memory the processor will typically produce a virtual address. A physical address is an address of main (i.e. higher level) memory, associated with the virtual address that is generated by the processor. A multi-task environment is an environment in which the processor may serve different tasks at different times. Within a multi-task environment, the same virtual addresses, generated by different tasks, is not necessarily associated with the same physical address. Data that is shared between different tasks is stored in the same physical location for all the tasks sharing this data while data not shared between different tasks (i.e. private data) will be stored in a physical location that is unique to its task.
This is more clearly illustrated in FIG. 1, where the y-axis defines virtual address space and the x-axis defines time. FIG. 1 illustrates four tasks 51-54. The execution of each task requires fetching code and data. The execution of the first task 51 involves fetching private code 11, shared code 12, shared data 13 and private data 14. The execution of the second task 52 involves fetching private code 21, and shared data 22. The execution of the third task 53 involves fetching private code 31, shared code 32, shared data 34 and private data 33. The execution of the fourth task 54 involves fetching private code 41, shared data 43 and private data 42. The shared code 12 and 32 are arranged to have the same virtual addresses and the same physical addresses. The shared data 13, 22, 34 and 43 are arranged to have the same virtual addresses however the associated data stored in external memory will be stored in different physical addresses. It is noted that each box in FIG. 1 represents multiple addresses.
Consequently, a virtual address cache will store data with reference to a virtual address generated by a processor while data to be stored in external memory is stored in physical address space.
Further, a virtual address cache operating in a multi-tasking environment will have an address or tag field, for storing an address/tag associated with stored data and a task identifier field for identifying as to which task the address/tag and data are associated.
Consequently, within a multi-tasking environment a ‘hit’ requires that the address/tag for data stored in the cache matches the virtual address requested by the processor and the task identifier field associated with data stored in cache matches the current active task being executed by the processor.
When a processor switches from one task to another task the contents of a virtual address data cache, associated with the first task, will typically be flushed to a higher level memory and new data associated with the new task is loaded in to the virtual address cache. This enables the new task to use updated data that is shared between the two tasks. However, the need to change the memory contents when switching between tasks increases the bus traffic between the cache and the higher-level memory, and increases the complexity of the operating system in the handling of inter-process communication. This may also produce redundant time consuming ‘miss’ accesses to shared data after the flush. In case of shared code, flush is not needed after task switch, however this increases the footprint of shared code by needing to duplicate the shared code in the cache memory.
One solution has been to use a physical address cache where a translator translates the virtual address generated by a processor into a respective physical address that is used to store the data in the physical address cache, thereby ensuring that data shared between tasks is easily identified by its physical address.
However, the translation of the virtual address to its corresponding physical address can be difficult to implement in high-speed processors that have tight timing constraints.
It is desirable to improve this situation.