1. Field of the Invention
The present invention relates to data processing systems. More particularly, the present invention relates to the management and control of the memory system within a data processing system.
2. Description of the Prior Art
It is desirable that data processing systems should operate as quickly as possible to meet the increasing demands for processing capability placed upon them. In this regard, there is continual progress in producing processing systems that operate at higher speeds and so are able to execute more instructions per second. As the processors increase in speed, it is important that other systems of the data processing system should also increase in speed if they are not to become a processing bottleneck holding back the overall performance of the system. An example of such an other system is the memory system associated with a data processing system.
A memory system of a high performance data processing system may comprise a hierarchy of levels of data storage, e.g. an internal on-chip cache, an external off-chip cache, a random access memory and a non-volatile memory, such as a hard drive or flash ROM. Schemes which can increase the overall performance of the memory system of a data processing system are highly advantageous.
Viewed from one aspect the present invention provides data processing apparatus comprising:
(i) a cache memory having a plurality of cache storage lines;
(ii) a plurality of main memory units operable to store data words to be cached within said cache memory; and
(iii) a cache victim select circuit for selecting a victim cache storage line into which one or more data words are to be transferred from one of said main memory units following a cache miss; wherein
(iv) said cache victim select circuit is responsive to an operational state of at least one of said main memory units when selecting said victim cache storage line.
A cache memory does not typically have enough storage capacity to store all of the data that may be required by the system. Accordingly, the cache memory stores a subset of the total data and when a memory access request is made to an item of data not stored within the cache, then that item of data must be fetched to the cache. In order to make room for the new item of data within the cache, an existing item of data has to be removed from the cache. The selection of which cache storage line (set of data items) should be replaced is performed by a cache victim select circuit. When there are a plurality of main memory units holding the data that is to be cached within the cache memory, then different victim selections will require accesses to be made to different ones of this main memory unit. In this circumstance, it is strongly desirable that the cache victim select circuit should be responsive to the operational state of at least one of the main memory units. Arranging the cache victim select circuit to be responsive to an operation state of at least one of the main memory units allows the victim selection to be adjusted depending upon the detected operational state and accordingly higher performance to be achieved through the selection of a cache victim that will cause the least delay.
The present invention is particularly useful when the cache memory is configured as a write back cache memory. In such embodiments data words from the victim cache line have to be written back to the main memory from where they originally came and so the operational status of that main memory may be critical in determining the degree of delay that would be associated with selection of that particular cache line as the victim cache line.
One highly useful operational parameter to sense regarding a main memory unit is whether or not that main memory unit is already busy exchanging one or more data words with the cache memory. If the main memory unit is already busy, then its current operation will have to complete before it is able to service any requirements stemming from the selection of a victim cache storage line that requires that busy main memory unit to be accessed.
The advantages of the invention are particularly evident when there are many memory masters simultaneously requesting access and a plurality of main memory units that are able to operate independently and concurrently transfer data words to the cache memory. In such embodiments it is highly desirable to select as a cache victim a cache storage line that is not already busy performing a data exchange with the cache memory. The ability of parallel data exchanges with the cache memory to occur increases system performance and accordingly it is desirable that the memory access workload be split evenly between the main memory units to make better use of this parallel capability.
In preferred embodiments it is desirable that the cache victim select circuit should be responsive to a dirty flag (a flag indicating that a line contains one or more data words that have been changed since they were transferred to the cache memory from the main memory) associated with the cache storage lines so as to select in preference those cache storage lines that are marked as non-dirty. Non-dirty cache storage lines will not require writing back to the main memory and so the delay associated with refilling that cache storage line will be reduced.
In modem high performance data processing systems it is advantageously efficient to provide more than one data word requesting unit that may each request exchange of one or more data words with the cache memory. Sharing the memory structures between data word requesting units in this way provides an advantageous compromise between making the most efficient use of the circuit resources provided balanced against the requirements for maximum performance.
Typical examples of data word requesting units are a central processing unit and a video display driving circuit.
In a system having multiple data word requesting units as discussed above, it is desirable that one or more cache storage lines may be locked for preferential use by one of the data word requesting units. In this way it is possible to reduce the likelihood of the activity of one of the data word requesting units having an undue detrimental impact upon the performance of another of the data word requesting units.
A further way in which the cache memory resources may be made better use of is to arrange the cache victim select circuit to be responsive to an indication of which cache storage lines were least recently used when selecting the victim cache storage line.
An overall scheme that has been found particularly advantageous is one in which said cache victim select circuit selects as said victim cache storage line that cache storage line having properties placing it highest in a list of N properties, where 1xe2x89xa6Nxe2x89xa66, said list of N properties being formed of the N highest properties in the list:
(i) least recently used line that is not locked and is not dirty;
(ii) least recently used line that is not locked, is dirty and can be written back to a main memory unit that is not busy;
(iii) least recently used line that is not locked, is dirty and has to be written back to a main memory unit that is busy;
(iv) least recently used line that is locked and is not dirty;
(v) least recently used line that is locked, is dirty and can be written back to a main memory unit that is not busy;
(vi) least recently used line that is locked, is dirty and has to be written back to a main memory unit that is busy.
In some circumstances a partially random cache victim selection scheme may be preferred as a starting point and in such embodiments said cache victim select circuit selects as said victim cache storage line that cache storage line having properties placing it highest in a list of N properties, where 1xe2x89xa6Nxe2x89xa66, said list of N properties being formed of the N highest properties in the list:
(i) randomly selected from those cache storage lines that are not locked and are not dirty;
(ii) randomly selected from those cache storage lines that are not locked, are dirty and can be written back to a main memory unit that is not busy;
(iii) randomly selected from those cache storage lines that are not locked, are dirty and have to be written back to a main memory unit that is busy;
(iv) randomly selected from those cache storage lines that are locked and are not dirty;
(v) randomly selected from those cache storage lines that are dirty and can be written back to a main memory unit that is not busy;
(vi) randomly selected from those cache storage lines that are locked, are dirty and have to be written back to a main memory unit that is busy.
In other circumstances a partially round robin cache victim selection scheme may be preferred as a starting point and in such embodiments said cache victim select circuit selects as said victim cache storage line that cache storage line having properties placing it highest in a list of N properties, where 1xe2x89xa6Nxe2x89xa66, said list of N properties being formed of the N highest properties in the list:
(i) selected in sequence from those cache storage lines that are not locked and are not dirty;
(ii) selected in sequence from those cache storage lines that are not locked, are dirty and can be written back to a main memory unit that is not busy;
(iii) selected in sequence from those cache storage lines that are not locked, are dirty and have to be written back to a main memory unit that is busy;
(iv) selected in sequence from those cache storage lines that are locked and are not dirty;
(v) selected in sequence from those cache storage lines that are locked, are dirty and can be written back to a main memory unit that is not busy;
(vi) selected in sequence from those cache storage lines that are dirty and have to be written back to a main memory unit that is busy.
The system may be configured to operate with only some of the properties (e.g. be responsive to properties (i), (ii) and (iii) or (i) and (ii)) and then, if a matching cache storage line if not found, at least part of the system is placed into a wait state until a suitable victim becomes available.
The plurality of main memory units may be main memory units at the same level within the memory hierarchy. In this context banks of dynamic random access memory that may be concurrently and independently accessed are particularly well suited for use with the invention and are increasingly desirable for other reasons, such as reduced cost single-chip designs.
Viewed from another aspect the present invention provides a data processing method comprising the steps of:
(i) storing data words within a plurality of cache storage lines of a cache memory;
(ii) storing in a plurality of main memory units said data words to be cached within said cache memory; and
(iii) selecting a victim cache storage line into which one or more data words are to be transferred from one of said main memory units following a cache miss; wherein
(iv) said selection is responsive to an operational state of at least one of said main memory units when selecting said victim cache storage line.
Viewed from a further aspect the present invention provides a data processing apparatus comprising:
(i) a write back cache memory having a plurality of cache storage lines;
(ii) at least one main memory unit operable to store data words to be cached within said cache memory, a cache storage line being dirty if it contains any data words that have been changed since they were transferred from said at least one main memory unit to said cache storage line; and
(iii) a background operation control circuit for triggering writing back of data words from dirty cache storage lines to said at least one main memory unit as a background process, cache storage lines written back using said background process becoming not dirty and continuing to store said data words that were written back.
Write back caches have the advantage that data transfers between the cache and the main memory are reduced in number. More particularly, a data transfer to the main memory only occur when the data words are loaded into the cache and then when the data words are flushed from the cache. Changes that occur to the data words whilst they are stored within the cache are not passed to the main memory but are left until the cache data is flushed from the main memory when the final state of the data words is written into the main memory. If data words that are cached have not been changed whilst they were stored in the cache then there is no need for them to be written back to the main memory. Accordingly, in order to differentiate between cache data words requiring writing back and those not requiring writing back a dirty flag may be provided.
The invention recognizes that cache refills to dirty cache lines are slower than to non-dirty cache lines and so measures that can reduce the number of cache refills needed to dirty cache lines are advantageous. The invention further recognizes that there are periods of time in which the bandwidth between the main memory and the cache memory is not being fully utilized or may in fact be standing completely idle if all of the data requirements of the system can be met from cached data. The invention exploits this otherwise unused capacity to reduce the number of dirty cache lines within the cache memory as a background process. This in turn reduces the number of write backs of dirty cache data that have to be performed during the foreground processing operations which are accordingly speeded up.
In a normal system dirty data is written back to the main memory as it is flushed (i.e. removed) from the cache memory. In contrast, in the present invention the dirty data is written back to the main memory but is also retained within the cache memory but now being marked as non-dirty.
The present invention is particularly useful in systems having a plurality of main memory units that are able to operate independently and concurrently to transfer data words to the cache memory as such systems will often have unused bandwidth between the cache memory and the main memory that can be exploited by the background process of the present invention.
In preferred embodiments the background process is also responsive to how recently a cache data word has been used when determining whether or not it should be written back if dirty. If a cached data word is being used very frequently and so likely to change very frequently, then it is advantageous that it should not be written back as part of the background process since this would consume a disadvantageous amount of electrical power and may also utilize some of the spare bandwidth to the main memory system that could be more effectively used by writing back cached data words that infrequently changed and so would be likely to remain non-dirty once they had been written back.
A further refinement in the background process is that write backs should not be attempted to main memory units that are already busy servicing another memory access request.
The invention is particularly well suited to embodiments in which the main memory unit comprises a plurality of banks of dynamic random access memory of flash memory and that is fabricated as a single-chip device, although commodity DRAM or flash memory could be used.
Viewed from a further aspect the present invention provides a data processing method comprising the steps of:
(i) storing data words within a plurality of cache storage lines of a write back cache memory;
(ii) storing in at least one main memory unit said data words to be cached within said cache memory, a cache storage line being dirty if it contains any data words that have been changed since they were transferred from said at least one main memory unit to said cache storage line; and
(iii) writing back data words from dirty cache storage lines to said at least one main memory unit as a background process, cache storage lines written back using said background process becoming not dirty and continuing to store said data words that were written back.
Viewed from a further aspect the present invention provides data processing apparatus comprising:
(i) a memory circuit;
(ii) a data bus coupled to said memory circuit;
(iii) a plurality of bus master circuits coupled to said data bus for issuing memory access requests to said memory circuit via said data bus;
(iv) a bus arbitration circuit for controlling in accordance with a hierarchy of bus master priorities which bus master is granted priority in gaining use of said data bus when more two or more bus masters issue temporally overlapping memory access requests; wherein
(v) said bus arbitration circuit is responsive to a determination of latency of pending memory access requests to re-arbitrate priority in gaining use of said data bus between bus masters such that a first bus master circuit having a first pending memory access request and a lower position in said hierarchy than a second bus master circuit having a second pending memory access request may gain use of said data bus ahead of said second bus master circuit if said first memory access request has a lower latency than said second memory access request.
Bus arbitration between different bus masters normally is performed based upon a fixed hierarchy of priorities. However, the present invention recognizes that more efficient use of the bus band width can be made when the bus arbitration circuit is responsive to a determination of the latency of different memory access requests and is able to re-arbitrate the priorities in dependence upon the determined latencies.
In particularly preferred embodiments it may be possible for a second memory access request to be started and completed entirely within the latency period of a first memory access request before that first memory access request in fact needs to use the data bus to complete.
A common situation in which the invention may be advantageously used is one in which the memory system comprises a cache memory and a main memory. In such systems if a high priority first memory access request results in a cache miss whereas a lower priority second memory access request results in a cache hit, then it is advantageous to re-arbitrate the priorities such that the second memory access request is serviced from the cache memory whilst the first memory access request continues to progress to perform the data fetch from main memory and cache line refill.
In an analogous manner in a system including a plurality of main memory units that can independently and concurrently operate, the invention may advantageously operate to re-arbitrate between memory access requests such that a request to a non-busy main memory unit may be moved ahead of an otherwise higher priority request to a busy main memory unit.
The invention is particularly useful in embodiments in which the main memory comprises one or more banks of dynamic random access and the system is provided as a single-chip.
Viewed from a further aspect the present invention provides a data processing method comprising the steps of:
(i) issuing memory access requests from a plurality of bus master circuits to a memory circuit via a data bus;
(ii) controlling, in accordance with a hierarchy of bus master priorities, which bus master is granted priority in gaining use of said data bus when more two or more bus masters issue temporally overlapping memory access requests; wherein
(iii) in response to a determination of latency of pending memory access requests, priority in gaining use of said data bus between bus masters is re-arbitrated such that a first bus master circuit having a first pending memory access request and a lower position in said hierarchy than a second bus master circuit having a second pending memory access request may gain use of said data bus ahead of said second bus master circuit if said first memory access request has a lower latency than said second memory access request.