1. Field of the Invention
This invention is related to the field of processors and, more particularly, to a data cache block zero instruction.
2. Description of the Related Art
Processors implement an instruction set architecture (ISA), which defines the instructions that the processor is designed to execute, the operation of the instructions, the operands for the instructions, etc. Software programmers/compilers use the ISA to create programs for execution on the processors.
Many ISA's include a data cache block zero (DCBZ) instruction. The DCBZ instruction stores zeros to all bytes of a cache block corresponding to a memory address generated during execution of the DCBZ. The DCBZ instruction has a variety of uses. For example, when a page is allocated by the operating system for use by a program, a series of DCBZ instructions can be used to zero the page. In this manner, the data previously stored in the page (which may belong to a different program or user) is not available to the program. The DCBZ is often used in block copy (BCOPY) routines (note that block, in the context of a block copy routine, may refer to a block that is larger than a cache block). The BCOPY routine zeroes the target of the copy using DCBZ instructions prior to copying the data to the target.
Since the DCBZ instruction is defined to write zeros to the entire cache block, there is no need to fetch the data that is currently stored in the cache block from memory (e.g. if the DCBZ misses in the data cache). Data bandwidth on the interconnect to the processor can be conserved by not transmitting the data. Typically, the processor transmits an invalidate transaction to invalidate any other copies of the data that may exist in the system, and then the cache block is allocated into the data cache and zeroed in the data cache.
Most processors implement a sequential consistency model for access to memory in a multi-processor system. Formally, a system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the order of the operations from a given processor in this sequential order are the same as the order of the operations in the program executed by the given processor. A key component of sequential consistency is that a read of a location that occurs prior to a write to that location in the sequentially consistent order receives the data in that location prior to the write, and that a read of a location that occurs subsequent to the write in the sequentially consistent order receives the data written to the location for a write.
Implementing a DCBZ using the invalidate and allocate (in the cache) scheme described above in a sequentially consistent model requires that the invalidate only be performed when the DCBZ is non-speculative. If the invalidate transaction is performed speculative, it may invalidate the most recent copy of data stored in the cache block (e.g. in another cache in the system). Then, if another read occurs before the DCBZ in the sequentially-consistent order (or global order) but after the speculative invalidate transaction, the most recent data is not available and the zeros from the DCBZ cannot yet be used. Unfortunately, when a series of DCBZ instructions occurs in close proximity in a routine, the requirement that the DCBZ instructions be non-speculative slows the execution of the routine. The latency of transmitting each invalidate transaction and receiving the corresponding response impacts each DCBZ instruction in the series. That is, M DCBZ instructions, each experiencing L clock cycles of latency, require at least M*L clock cycles to execute because of the non-speculative requirement. The latency may be even worse in some cases. For example, if the processor is coupled to a bus that implements retries as part of its protocol, and an invalidate transaction is retried, then all subsequent DCBZ invalidates are typically rescheduled to ensure sequential consistency between the DCBZ instructions.