1. Field of the Invention
The present invention relates to a cache memory in a computing system, and more particularly to the configuration of a cache memory added to a processor and a cache memory provided external to a processor in a computing system.
2. Description of the Related Art
The performance of microprocessors has been improved every year in consequence of improved frequencies resulting from miniaturized design processes for large-scaled integrated circuits and an enhanced processing efficiency resulting from improved schemes. With this improvement, the requirement of the improved access throughput and reduced access latency of the memory system connected to the microprocessor is increasing.
A method using cache memory has become common as a method of improving the performance of the memory system.
A cache memory is one type of memory which has a high access throughput and a short access latency although it has a small capacity as compared with a main memory. A cache memory may be disposed between a processor and a main memory to temporarily hold therein a portion of the contents of the main memory. During accesses to the memories by the processor, data held in the cache memory is supplied therefrom to enable the data to be supplied with a higher throughput and a lower latency, as compared with data supplied from the main memory. As the capacity of the cache memory is increased, target data for a memory access issued by the processor is more likely to exist on the cache memory (called xe2x80x9ccache hitxe2x80x9d), thus permitting an improvement in averaged access throughput and a reduction in averaged access latency.
Recent processors often have a hierarchical cache configuration which has both an internal cache that exhibits a high performance but has a small capacity and an external cache that is inferior to the internal cache in performance but has a larger capacity. The internal cache is provided in the same integrated circuit as the processor. For this reason, it can operate at a high frequency and can have a plurality of access ports, so that it can offer a higher throughput and a lower latency than the external cache. However, due to a limitation to the amount of circuits accommodated in an integrated circuit, an internal cache having a large capacity is difficult to implement. The external cache, on the other hand, is composed of dedicated or general-purpose memory devices, and is connected to an integrated circuit which serves as a processor or a cache controller. Thus, it is possible to implement a large capacity of cache as compared with the internal cache. However, in the signal lines connected to the external of the integrated circuits, the operating frequency therein becomes lower than that in the integrated circuit and the number thereof is limited, so that the throughput becomes lower as compared with the internal cache. In addition, it takes more time to transmit and receive signals to and from the integrated circuits, resulting in a longer latency. Thus, the provision of both the internal cache and the external cache can mutually complement their respective shortcomings. In recent years, a scheme having a more number of hierarchical levels has also been in practical use.
For obtaining a computing performance which cannot be realized by a single processor, a multi-processor system in which a plurality of the aforementioned processors are connected through a bus or network may be built.
In the multi-processor system, the plurality of processors access to the common memory (shared memory) to progress the processing. In such a system, when a certain processor issues a memory access, it is necessary to ensure the consistency of caches in all the processors by checking whether the most recent data exists on each of the caches in all of the remaining processors. This processing is called xe2x80x9ccache coherence control (snoop).xe2x80x9d
In this way, in the multi-processor system, a cache tag, which holds information on data stored in the cache memory, is accessed both for the memory access from the processor and for the cache coherence control from the remaining processors.
In JP-A-5-257809 (prior art 1), there is described a method in which an external cache is configured in accordance with a direct map scheme as a method of connecting a processor having a hierarchical cache to a system which couples a plurality of processors, and a cache tag MTAG for the external cache and a differential tag PTAG for storing differentials which are produced by excluding information included in the cache tag MTAG for the external cache from cache tag information for an internal cache are provided in the outside of the processor. In this example, MTAG and PTAG are simultaneously checked during the cache coherence control from the remaining processors to simultaneously determine the need for the cache coherence control for the external cache and the internal cache.
In the fifth embodiment of JP-A-4-230549 (prior art 2), there are described a method in which a cache tag DL2 which is substantially identical to a cache tag for an external cache of the processor (called xe2x80x9cdirectoryxe2x80x9d) is provided as a method of coupling each of the processors to a system bus which interconnects a plurality of processors, and DL2 is first checked during the cache coherence control from the remaining processors and then the cache tag for the external cache is accessed only when it is determined that the cache coherence control for the external cache is required. In this example, the cache tag for the external cache and the separately provided cache tag DL2 have substantially the same capacity.
In the multi-processor system, the cache tag is accessed both for the memory access from the processor and for the cache coherence control from the remaining processors. Particularly, as the multi-processor system has a larger number of processors, the amount of the cache coherence control requests from the remaining processors is increased. For this reason, a method of improving the access performance of the cache tag is required. On the other hand, since the external cache memory has a large capacity, it is difficult to implement it in the same integrated circuit as the processor or the cache controller.
While it may be possible to implement only the external cache tag in the same integrated circuit as the processor or the cache controller to improve the access performance of the cache tag, a limited capacity of the external cache tag would limit the capacity of the external cache memory. Thus, it is not suitable for a multi-processor system which runs a large scaled program.
With the cache memory configuration described in the prior art 1, the external cache tag is accessed each time the processor issues the memory access and each time the remaining processors perform the cache coherence control.
When an external cache memory having a large capacity is implemented, an external cache tag is configured as a memory external to a cache controller, thus causing difficulties in realizing a higher throughput and a lower latency. With the cache memory configuration described in the prior art 2, the external cache tag is accessed when the processor issues the memory access, while the cache tag DL2 having substantially the same contents as the external cache tag is accessed when the remaining processors perform the cache coherence control. This does increase the processing throughput of the cache tag in double. However, when the external cache memory having a large capacity is implemented, the external cache tag and the cache tag DL2 are respectively configured as memories external to the cache controller, thereby failing to reduce the latency and increasing the number of signal lines between the cache controller and the cache tags approximately in double.
It is therefore an object of the present invention to improve the throughput of a system by reducing the frequency of accesses to a cache tag memory to enable more cache tag memory accesses than the prior arts.
To achieve the above object, the present invention is
a cache memory control circuit for controlling a cache memory comprising a cache tag portion and a cache data portion, wherein:
the cache memory control circuit includes a circuit for controlling the cache memory, a summarized cache tag memory for storing information summarizing contents of cache tag information, and a cache tag summarized information control circuit for controlling the summarized cache tag memory; the cache memory has one or more sets of ways, each set including a plurality of ways for storing data on the memory in blocks; the summarized cache tag memory has a cache tag summarized information entry in correspondence to cache tag information for each of the sets stored in the cache tag portion; and the cache tag summarized information entry has a less number of bits required for storage as compared with the cache tag information for each of the sets stored in the cache tag portion.
Further, the cache memory control circuit includes an interface to a higher hierarchical level and an interface to a lower hierarchical level;
reads the cache tag portion of the cache memory in response to an access request from the interface to the higher hierarchical level to determine whether or not a block stored in the cache memory is available; designates the cache data portion as a target for the access request from the interface to the higher hierarchical level when the result of the determination on the cache tag portion indicates that a target block stored in the cache memory is available; and issues an access request to the interface to the lower hierarchical level and receives an access result from the interface to the lower hierarchical level to store it in the cache memory and to designate the result as a target for the access request from the interface to the higher hierarchical level, when the result of the determination on the cache tag portion indicates that the target block stored in the cache memory is not available or the target block is not stored in the cache memory.
Furthermore, the cache memory control circuit includes an interface to a higher hierarchical level and an interface to a lower hierarchical level;
reads the cache tag portion of the cache memory in response to an access request from the interface to the lower hierarchical level to determine whether or not a target block is stored in the cache memory; and performs a change of the cache tag portion of the cache memory, an invalidation of the target block, or an output of most recent data in accordance of a type of the access request from the interface to the lower hierarchical level and a status of the cache tag portion.
Furthermore, the cache memory control circuit includes an interface to a higher hierarchical level and an interface to a lower hierarchical level;
reads a cache tag summarized information entry in the summarized cache tag memory through the cache tag summarized information control circuit in response to an access request from the interface to the higher hierarchical level to determine whether or not a target block is likely to be stored in the cache tag portion;
reads the cache tag portion of the cache memory to determine whether or not the block stored in the cache memory is available, when it is determined as a result of the determination on the cache tag summarized information entry that the target block is likely to be stored in the cache tag portion;
designates the cache data portion as a target for the access request from the interface to the higher hierarchical level, when it is determined as a result of the determination on the cache tag portion that the target block stored in the cache memory is available;
issues an access request to the interface to the lower hierarchical level and receives the access result from the interface to the lower hierarchical level to store it in the cache memory and to designate this result as a target for the access request from the interface to the higher hierarchical level, when it is determined as the result of the determination on the cache tag portion that the target block stored in the cache memory is not available or that the target block is not stored in the cache memory; and
issues an access request to the interface to the lower hierarchical level and receives the access result from the interface to the lower hierarchical level to store it in the cache memory and to designate this result as a target for the access request from the interface to the higher hierarchical level, when it is determined as the result of the determination on the cache tag summarized information entry that the target block is not likely to be stored in the cache tag portion.
Furthermore, the cache memory control circuit includes an interface to a higher hierarchical level and an interface to a lower hierarchical level;
reads a cache tag summarized information entry in the summarized cache tag memory through the cache tag summarized information control circuit in response to an access request from the interface to the lower hierarchical level to determine whether or not a target block is likely to be stored in the cache tag portion;
reads the cache tag portion of the cache memory to determine whether or not the target block is stored in the cache memory, when it is determined as a result of the determination on the cache tag summarized information entry that the target block is likely to be stored in the cache tag portion;
performs a change of the cache tag portion of the cache memory, an invalidation of the target block, or an output of most recent data in accordance of a type of the access request from the interface to the lower hierarchical level and a status of the cache tag portion, when it is determined as a result of the determination that the target block is stored in the cache memory; and
does not read the cache tag portion of the cache memory, when it is determined as the result of the determination on the cache tag summarized information entry that the target block is not likely to be stored in the cache tag portion.
Also, each of the cache tag summarized information entry possessed by the summarized cache tag memory has a bit length of N bits; and
addresses of blocks likely to be stored in the sets of the cache tag portions corresponding to the cache tag summarized information entries are classified into N groups, wherein
the cache tag summarized information entry is constituted as follows:
a first bit of the cache tag summarized information entry is registered as true when an effective address of a first group is stored in any way in the cache tag portion;
a second bit of the cache tag summarized information entry is registered as true when an effective address of a second group is stored in any way in the cache tag portion;
each of subsequent bits is registered in a similar manner; and
an Nth bit of the cache tag summarized information entry is registered as true when an effective address of an Nth group is stored in any way in the cache tag portion.
Also, when the cache tag summarized information control circuit reads a cache tag summarized information entry in the summarized cache tag memory in response to an access request from the interface to the higher hierarchical level or from the interface to the lower hierarchical level to determine whether or not a target block is likely to be stored in the cache tag portion,
it determines that the target block is likely to be stored in the cache tag portion when a bit in the cache tag summarized information entry which corresponds to a group number of an address of the target block for the access request is true.
Further, each of the sets in the cache tag portion is composed of N ways;
addresses of blocks likely to be stored in the sets of the cache tag portions are classified into M groups;
each cache tag summarized information entry possessed by the cache tag summarized information is composed of N fields; and
each of the fields includes a portion for storing a group number and a bit indicative of validity of the group number, wherein:
the cache tag summarized information entry is constituted as follows:
when an address stored in a first way of the cache tag portion is valid, a first field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c1,xe2x80x9d and stores a group number of the address in a portion for storing the group number to register it;
when the address stored in the first way of the cache tag portion is not valid, the first field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c0;xe2x80x9d
when an address stored in a second way of the cache tag portion is valid, a second field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c1,xe2x80x9d and stores a group number of the address in a portion for storing the group number to register it;
when the address stored in the second way of the cache tag portion is not valid, the second field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c0;xe2x80x9d
each of subsequent fields is registered in a similar manner;
when an address stored in an Nth way of the cache tag portion is valid, an Nth field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c1,xe2x80x9d and stores a group number of the address in a portion for storing the group number to register it; and
when the address stored in the Nth way of the cache tag portion is not valid, the Nth field in the cache tag summarized information entry sets the bit indicative of validity to xe2x80x9c0.xe2x80x9d
Furthermore, when the cache tag summarized information control circuit reads a cache tag summarized information entry in the summarized cache tag memory in response to an access request from the interface to the higher hierarchical level or from the interface to the lower hierarchical level to determine whether or not a target block is likely to be stored in the cache tag portion,
it determines that it is likely that the target block is stored in the cache tag portion, when a group number of the address of the target block for the access request is stored in any field within the cache tag summarized information entry and a valid bit in the field is set to xe2x80x9c1.xe2x80x9d
Furthermore, each of the sets in the cache tag portion is composed of N ways;
addresses of blocks likely to be stored in the sets of the cache tag portion are classified into M groups; and
each cache tag summarized information entry possessed by the summarized cache tag memory is composed of N fields for storing group numbers, wherein:
the cache tag summarized information entry is constituted as follows:
when an address stored in a first way of the cache tag portion is valid, a first field of the cache tag summarized information entry stores a group number of the address for registration;
when an address stored in a second way of the cache tag portion is valid, a second field of the cache tag summarized information entry stores a group number of the address for registration;
each of subsequent fields are registered in a similar manner; and
when an address stored in an Nth field of the cache tag portion is valid, an Nth field of the cache tag summarized information entry stores a group number of the address for registration.
Also, when the cache tag summarized information control circuit reads a cache tag summarized information entry in the summarized cache tag memory in response to an access request from the interface to the higher hierarchical level or from the interface to the lower hierarchical level to determine whether or not a target block is likely to be stored in the cache tag portion,
the cache tag summarized information control circuit determines that the target block is likely to be stored in the cache tag portion, when a group number of the address of the target block directed by the access request is stored in any field within the cache tag summarized information entry.
Further, each block stored in the cache memory is managed in one of four status including an invalid status (xe2x80x9cInvalidxe2x80x9d), an exclusive status (xe2x80x9cExclusivexe2x80x9d), a shared status (xe2x80x9cSharedxe2x80x9d) and a modified status (xe2x80x9cModifiedxe2x80x9d) in the cache tag portion, wherein:
the invalid status indicates that the block is not stored in the cache memory;
the exclusive status indicates that the block is stored in the cache memory, the block stored in the cache data portion is identical to corresponding data on a memory, and the block is not stored in any other cache memories connected on the lower hierarchical side;
the shared status indicates that the block is stored in the cache memory, the block stored in the cache data portion is identical to the corresponding data on the memory, and the block is likely to be stored in another cache memory connected to the lower hierarchical side;
the modified status indicates that the block is stored in the cache memory, the block stored in the cache data portion is different from the corresponding data on the memory and modified data is stored, and the block is not stored in any other cache memories connected on the lower hierarchical side; and
entry contents are registered in the cache tag summarized information entry of the summarized cache tag memory, only when the cache memory holds a block in the modified status or in the shared status.
The cache tag summarized information control circuit is configured to modify a corresponding cache tag summarized information entry in the summarized cache tag memory in synchronism with a change to the cache tag portion of the cache memory.
Furthermore, when no access has been issued to the cache tag portion of the cache memory, the cache tag portion is accessed together with an access to the summarized cache tag memory, and when the access has been issued to the cache tag portion of the cache memory, the access to the summarized cache tag memory is only started.
Furthermore, the interface to the higher hierarchical level is configured for connection to one or more processors or I/O devices.
The interface to the lower hierarchical level is configured for connection to one or more other cache memory control circuits or memories or I/O devices through a bus or a network.
The interface to the higher hierarchical level is configured for connection to one or more processor cores.
The interface to the lower hierarchical level is configured for connection to one or more processor buses.
Also, the present invention is a processor having a processor core section, a cache memory control circuit connected to the processor core section, and an interface section connected to the cache memory control circuit for interfacing to an external device, wherein:
the cache memory control circuit has a circuit for controlling a cache memory connected to the processor, a summarized cache tag memory for storing information summarizing contents of cache tag information, and a cache tag summarized information control circuit for controlling the summarized cache tag memory; the summarized cache tag memory has a cache tag summarized information entry in correspondence to the cache tag information stored in the cache tag portion; and the cache tag summarized information entry has a less number of bits required for storage as compared with the cache tag information stored in the cache tag portion.
Further, the present invention is a parallel processor system having a plurality of processor nodes and a memory interconnected through a network, wherein:
each of the processor nodes has one or more processors, a cache memory control circuit, a CPU bus interface for interfacing the one or more processors to the cache memory control circuit, and a network interface for interfacing the cache memory control circuit to the network; the cache memory control circuit has a circuit for controlling a connected cache memory, a summarized cache tag memory for storing information summarizing contents of cache tag information, and a cache tag summarized information control circuit for controlling the summarized cache tag memory; the summarized cache tag memory has a cache tag summarized information entry in correspondence to the cache tag information stored in the cache tag portion; and the cache tag summarized information entry has a less number of bits required for storage as compared with the cache tag information stored in the cache tag portion.
Furthermore, the present invention is a processor system having a processor node and a memory, wherein:
the processor node has one or more processors, a cache memory control circuit, a CPU bus interface for interfacing the one or more processors to the cache memory control circuit, and a network interface for interfacing the cache memory control circuit to the memory; the cache memory control circuit has a circuit for controlling a connected cache memory, a summarized cache tag memory for storing information summarizing contents of cache tag information, and a cache tag summarized information control circuit for controlling the summarized cache tag memory; the summarized cache tag memory has a cache tag summarized information entry in correspondence to the cache tag information stored in the cache tag portion; and the cache tag summarized information entry has a less number of bits required for storage as compared with the cache tag information stored in the cache tag portion.