In certain data processing systems, a directory of cache or memory (a memory directory is used in the shared global memory in multiprocessor systems) contains an array of entries that correspond one-to-one to data entries in the cache or the memory data array. Each directory entry contains an address tag and status bits. The address tag of the directory entry is the tag portion of the address of the data entry in the lowest level of the memory hierarchy. The data entry and the directory entry are retrieved by using the index portion of the request address to index into the directory array and the data array. The address tag of the retrieved directory entry is used to match the tag portion of the request. If matched, the valid entry in the data array is the addressed data. The status bits include a valid bit, value 1 of which indicates that the data entry is a valid one and value 0 otherwise. The status bits also may include a modified (dirty) bit which indicates whether the valid data entry contains a new value. When the bit has value 1, the entry has a new value which will be written back to the next lower level of memory hierarchy when the data entry is evicted to make room for a data entry of a new address. Status bits also may, depending on machine organization, include other bits such as "exclusive bit" or "inclusion bits" which are primarily used in multiple processor systems.
When a request is received by the cache or memory, the index of the request is used to read the directory. Then, upon determining that the retrieved entry is the addressed entry by matching the address tags, the request is processed. Depending on the request type (by the decode operation), new status bit values or new address tag bits need to be written back into the directory entry. For example, if the request is a castout or a write from the next higher level of memory hierarchy, the modified bit needs to be changed to one. Or, if the request is a castout from the current cache, the valid bit needs to be set to zero. Another example is when the request is for loading a new data entry from the lower level of memory hierarchy. In this case, a new address tag needs to be written into the directory entry and the modified bit needs to be reset to zero. Also, in a multiprocessor system, when more than one next higher level cache is connected to the current level caches, requests from the next higher level caches will result in "exclusive bit" or "inclusion bit" changes. The index portion of the request address will be used to locate the directory entry to which the write will be performed.
New status bits values result in new ECC (error correction code) bit values, and as a result new ECC bit values need to be generated.
Error correction coding is a method of encoding information so that errors that occur during transmission or storage of data can be detected and also corrected. With respect to entries in memory devices, it is imperative that both the addresses for finding such entries and the status information for indicating such things as whether or not the associated memory entry is the latest version of that data must be exactly maintained so that there is not a loss of data stored in memory or there is not a use of invalid data by one of the devices within a data processing system. A mere error in one bit within an address may result in a loss of the associated data. Error correction coding is a method employed to detect such errors and correct them.
A problem with the generation of error correction code ("ECC") is that it often requires an extra step during a read or write operation. This extra step often involves the requirement or one or more additional clock cycles to perform the ECC generation, which is a concern in today's high frequency designs of data processing systems where the reduction of processing cycles is an ongoing endeavor.
This may be especially noted within the design of cache-related circuitry associated with processor(s) in a data processing system. If each read or write operation requires an additional processor cycle, it can be readily appreciated how the reduction in one processor cycle can result in a much faster design.
Thus, minimizing ECC circuitry is very important in high frequency cache designs, because the ECC logic is often located on a critical path during the performance of read/write operations. Therefore, while ECC has become a must in high-reliability systems, to accommodate ECC without sacrifice and performance is a critical issue.
The discussion herein focuses on pipelined directory controllers. Pipelined controllers are needed because all the operations involved in processing a request to the cache or memory take too long to complete. By using a pipelined controller, more requests can be processed in a fixed duration of time. The operations of each request are divided into small partitions each processed by a different pipeline stage, which takes a smaller amount of time, a pipeline cycle, to finish. The pipeline can start processing a new request every pipeline cycle. A pipelined controller is very useful in providing high bandwidth to requests.
The conventional approach of a pipelined directory controller is depicted in FIG. 3. A request is first decoded in the first stage while the directory is looked up by the index portion of the address. This is the directory access/request decode stage 31. A request is decoded into types which will give different responses to the requester, or perform different modifications on the directory entry. When the directory entry is read out (several entries may be read out simultaneously in a set associative cache), its address tag is compared to the address tag of the request, and the valid bit from the directory entry is checked. If the checking verifies the valid entry and the tags match, a hit signal is sent back to the requester with the data entry read out from the data array; otherwise a miss signal will be sent to the requester. In the meantime, the ECC bits of the directory entry are checked for error. These operations are performed in stage 32 and the new directory entry and the new ECC bits are ready to be written back to the directory. If writing a new entry is necessary, the logic in stage 34 will assert the write/read signal which will cause multiplexer 35 to select, from the address of the request that just finished stage 34, the index for updating the directory. The update occurs in the cycle after stage 34 and is denoted by signals from stage 34 to directory array 30 and multiplexer 35. In the update cycle, a new request coming into the controller needs to be reissued or the controller logic will stall any incoming request for that cycle to let the directory to be written. When a request generates no entry to be written back to the directory, the logic in stage 34 will deassert the write/read signal such that multiplexor 35 will select the next incoming request.
The director entry (Entry.sub.-- slct) read out from directory array 30 by a request and inputted into stage 32 is illustrated with n (=k +j) bits of information (a k bit tag and a j bit status) and m ECC check bits. One request processing operation determines whether to construct a new entry for directory 30 or to simply modify an existing directory entry. For the former, the tag accompanying the request will be used as the tag for the new directory entry. For the latter, the tag read out and selected (e.g., in the set associative cache) will be used. If a request finds the addressed entry, and the purpose of the request is to access (read or write) the data entry, then the same address tag should remain in the directory entry. In this case, the tag select will enable the tag portion of the entry.sub.-- slct to pass multiplexer 14. However, if the request does not find the addressed entry and results in replacing the existing data entry from the data entry from the next lower level memory in the hierarchy, the tag from the request will be enabled to pass multiplexer 14.
Thereafter, in stage 33, in accordance with the results of the hit/miss detection and the processing of the request, new status bits are generated (logic 17) for the new entry. In stage 34, an ECC is generated (logic 20) for the entry information selected in stage 33 to update directory array 30. The ECC generation logic 20 operates on the n information bit of the entry to generate a new m-bit ECC. With the short cycle time, the ECC generation often cannot be combined with the logic in the previous stages 31-33 and still meet timing requirements in the conventional approach. In a conventional design, ECC generation will not start until the new directory entry is ready. The last part of the new entry will not be ready until the end of the cycle time in stage 33. The time it takes to generate ECC bits for the entire directory entry takes about a large portion of a cycle. Therefore, if the ECC generation for the entire entry is moved to the previous stage, the time to generate the last part of the new entry plus the time to generate ECC for the entire entry will exceed the cycle time required. Therefore, it cannot be combined with previous stages in a conventional approach. Therefore, the extra pipeline stage 34, and the request processing latency, requires an additional cycle.
Thus, there is a need in the art for an ECC generation technique that does not require an extra cycle to process.