1. Field of the Invention
This invention relates generally to data processing systems and, more particularly, to a data cache memory unit forming part of a central processing unit. The data cache unit stores data groups that are required by an associated execution unit of the central processing unit. Because a data signal group stored in the data cache memory unit can be changed in some other part of the data processing system, apparatus must be provided to insure that the execution unit does not process an invalid data signal group.
2. Description of the Related Art
The ability of data processing systems to execute instructions has increased to such an extent that one figure of merit of a data processing unit is the number of instructions executed per second. In this processing environment, the challenge has been to provide data and instruction signal groups to the execution units at a rate commensurate with the rate at which the instructions are being executed. Ideally, a large memory unit including all of the data and instruction signal groups required for the execution of a program would be available for the execution of instructions. However, such a memory unit would not only have to be large enough to handle the enormous amount of data and instruction groups, but would also have to be implemented in a technology that is sufficiently advanced to keep pace with requirements for data and signal groups by the apparatus executing the instructions. In addition, the modem data processing systems process data at such a rapid clock cycle that the physical dimensions over which the information must travel can provide a limitation for the execution of instructions. For these and other practical considerations, the use of a cache memory unit is typically provided in a data processing system.
A cache memory unit is a comparatively small memory unit that is physically positioned close to the execution unit, the unit in the data processing system that executes the instructions needed to process data signal groups according to a sequence of instructions generally referred to as a program. The cache memory unit is designed to store and make accessible the data signal groups and the instruction signal groups for which the execution unit has the most immediate requirement. The cache memory unit is designed to provide the required data and instruction groups in an attempt to minimize the time in which the execution unit is inoperative, i.e., as result of the temporary unavailability of the data or instruction signal groups. In the data processing system with a cache memory, the data and signal groups that are not in the cache memory unit are stored in the main memory unit. Consequently, a continuing exchange of data and instruction groups is maintained between the cache memory unit and the main memory unit in order to provide the required data and instruction groups to the execution unit.
However, as the speed with which instruction groups could be executed has increased, the cache unit associated with the execution unit has expanded to a size wherein the data and instruction groups could not be provided to the execution unit in a timely fashion. As consequence, a memory hierarchy is typically implemented. In this implementation, the data and instruction group stored in a plurality of cache memory units, the cache memory units including the cache memory unit (i.e., an L1 cache memory unit) coupled to the execution unit, through a sequence of intermediate cache memory units, L2 through LN, and terminating in the main memory unit, the main memory unit being the highest memory unit level. The hierarchy of cache memory units is utilized such that the probability that a data signal group will be required by the execution of the data processing unit is highest for the lowest order memory unit, the L1 or instruction/data cache memory unit, and the probability for requirement of a data/instruction group is lowest for the highest order memory unit, the main memory unit. Correspondingly, for the middle memory units of the memory unit hierarchy, the probability for requirement of a data/signal group by the execution unit becomes increasingly higher as the order of the cache memory unit LM in which the data or instruction signal group is stored becomes lower.
For example, referring to FIG. 1, a data processing system 10 having a hierarchical memory system is shown. The data processing system 10 includes a central processing unit 11, a main memory unit 15 and peripheral units #1 171 through peripheral unit #N 17N. In general, the central processing unit 11 processes data and instruction signal groups that are stored in the main memory 15 and intervening cache memory units. These components are coupled together by at least one bus 19. The peripheral units 171–17N permit the interaction between users and the data processing system 10 and can provide for the storage of programs and data associated with programs not currently being executed. The central processing unit 11 typically includes an execution unit 111, a data cache memory unit 115 and an instruction cache memory unit 113. The execution unit 111 processes data signal groups from the data cache memory unit 115, the processing being performed under the control of instruction signal groups from the instruction cache memory unit 113. The data cache memory unit 115 and the instruction cache memory unit 113 contain the data and instruction signal groups that have the highest probability of being required by the execution unit 111. The data cache memory unit 115 and the instruction cache memory unit 113 together form the L1 cache unit 113, 115. In the central processing unit, an L2 cache memory unit 117 is coupled to the data cache memory unit 115 and the instruction cache memory unit 113. When the required data and/or instruction signal groups are not present in the L1 cache memory unit 113, 115, the execution unit 111 attempts to retrieve the required signal group from the L2 cache memory unit 117. The L2 cache memory unit 117 stores signal groups that have, according to an operating algorithm in the central processing unit 11, less probability of being required by the execution unit than the signal groups in the L1 cache memory unit 113, 115. External to the central processing unit 11, an L3 cache memory unit 13 can be present, and, when the required signal groups are not in the L1 cache memory unit 113, 115 or in the L2 cache memory unit 117, the execution unit will attempt to retrieve the required signal group(s) from the L3 cache memory unit. When the required signal group is not in any of the higher level cache memory units, the execution unit 111 retrieves the required signal group from the main memory unit 15. As will be clear to those skilled in the art, the foregoing description is not complete, but illustrates the relationship of various components of the data processing system 10 to the cache memory unit 115, the unit to which the present invention is most directly addressed.
Referring next to FIG. 2, the principal components of the data cache unit 20, according to the prior art, is shown. For a WRITE or a READ operation, the data cache unit 20 receives an address signal group from the execution unit requiring the data signal group identified by the address signal group. A data signal group in the data cache memory unit is typically identified by a virtual address signal group, the virtual address signal group being an address signal group used by a central processor to identify a data signal group. The virtual address is a translation of the physical address, the physical address identifying a (physical) address/location of a group of storage cells in the main memory unit. The data cache memory unit 20 stores the data signal groups at a location in a storage cell array. Thus, when the execution unit applies a (virtual) address to the address signal group input port of the data cache unit 20, this address signal group is applied to an address decoder unit 211. The address decoder unit 211 decodes the applied address and accesses a specific group of storage cells in the storage cell array identified by the virtual address. For a WRITE operation, a data-in signal group is applied, through an data-in input port, to the accessed group of storage cells in the storage cell array unit 213 and are stored therein. For a READ operation, the data signal groups in the accessed group of storage cells in the storage cell array 213 are retrieved and applied to data-out output port. The data signal groups applied to the data-out output port are transmitted to the execution unit.
The number of storage cell locations in the storage cell array is not sufficient to provide a cache memory unit location for every possible address. Consequently, the address signal groups are divided in two portions. The first portion, typically referred to as the index portion, is used to identify the location of a group of storage cells in the storage cell array unit 213. The second portion, typically referred to as a tag portion, completes the identification of the particular data signal group. The index portion of the address signal group from the execution unit is applied to the tag unit 25, to a valid bit unit 29 and to the address decoder unit 211. As with the storage cell array unit 213, the index portion accesses specific locations in both the tag unit 25 and the valid bit unit 29. The tag portion of the address signal group from the execution unit is applied to the data-in terminals of the tag unit 25 and to the comparator unit 27. In a READ operation, the application of an address signal group results in a data signal group from the storage cell array unit 213 being applied to the gate unit 23. Simultaneously, the application of the index portion of the address to the tag unit 25 results in a tag portion signal group being applied to the comparator unit 27. Also applied to the comparator unit 27 is the tag portion of the applied address signal group from the execution unit. When the tag portion of the address stored in the tag unit 25 and the tag portion of the applied address signal group are the same, the comparator 27 sends a control signal to gate unit 23. The signal transmitted to the gate unit 23 indicates that the data signal group that is applied to the output terminals of the data storage array unit 21 is, in fact, the data signal group identified by the address signal group from the execution unit. As indicated above, the index address signal group is also applied to the valid bit unit 29. At an addresses location in the valid bit unit 29 is stored at least one valid bit signal. The valid bit identifies whether the data stored in the associated storage cell location is valid. When a data signal group associated with a address signal group is changed or, for some other reason, becomes questionable, an indication must be made that the (now invalid) copies of data signal group in the storage cell array 213 should not be used in further processing. Thus, if a valid bit is no longer associated with an address signal group, the associated data signal group should not be processed by the execution unit. This prohibition against processing is accomplished by applying an appropriate signal, when the valid not is set in valid bit unit 29, to the gate unit 23. Thus, a signal from either the comparator unit 27 or the valid bit unit 29 prevents the data signal group from the data storage array unit 21 applied to gate 23 from being transmitted therethrough. When, for whatever reason, the data signal group is not transmitted by the gate unit 23, a MISS signal is transmitted to the execution unit from the gate unit 23. The presence of this signal permits the execution unit to perform the requisite functions to retrieve the appropriate valid data signal group.
For a WRITE operation, the index portion of the address signal group from the execution unit is applied to the data storage array unit 21, thereby accessing a specific group of locations in the storage cell array unit 213. The data in signal group, associated with the applied address signal group, is stored in the accessed storage cell locations of the storage cell array 213. In addition, the index portion of the applied address signal group is applied to address terminals of the tag unit 25 while the tag portion of the applied address signal group is applied to the data-in terminals of the tag unit 25. The tag portion of the applied signal group is therefore stored in the location in the tag unit 25 associated with the applied address signal group. The location in the valid bit unit 29 associated with the applied address signal group is accessed with the index portion of an address signal group so that a signal in the corresponding valid bit location is updated.
As the need for increased processing power has been felt, one of techniques for meeting this need has been to implement the execution unit using pipeline techniques. In a pipeline implementation, a processing apparatus is designed to implement one operation, for example an addition operation, with a series of sub-operations, each sub-operation requiring an equal time period to perform. The sub-operations are executed in one system clock, each system clock cycle being (typically appreciably) faster than the time to execute the operation in an unpipelined manner. While the total time to implement a pipelined operation can be longer than the time to implement an unpipelined operation, once the apparatus implementing each sub-operation of the pipeline is filled, the processing operations are completed at the rate required to implement each sub-operation, i.e., at the system clock rate. Not-with-standing the need for increased data access speed, provision must be made for data coherency. Data coherency is the requirement that only one “correct” copy of a data group can exist in a data processing system at any one time. When the execution unit processes a data signal group, then the copy of the data signal group in the main memory must be updated. Similarly, when a plurality of data cache units are present in a multiprocessing system, each might contain a copy of a particular data signal group at any one time. When one of the execution unit changes one of the copies of the particular data signal group, then not only must the main memory unit version be updated, but the any other copies of the data signal group in the data processing unit must be invalidated or updated.
A need has therefore been felt for a data cache memory unit having the feature that the integrity of the data signal groups stored in the data cache unit-can be verified without unnecessary impact on the exchange of data signal groups between the data cache memory and the execution unit. It is a further feature of the present invention that data signal groups stored in the data pipelined cache memory unit be provided with an associated valid bit without impact on the rate of storage of data signal groups in the data cache unit. It is still further feature of the present invention to be able to provide valid bits for the storage of a plurality of data signal groups stored in the data cache memory in a single memory access. It is yet another feature of the present invention to provide apparatus and methods that will permit a snoop operation to be pipelined at the same clock rate as the system clock, and at the same time when the execution unit is accessing the data.