This invention relates generally to storing external device result data, and more particularly to utilizing a load store unit to process co-processor storage updates.
In microprocessor cores, there may be times when it is desired to allow an external logic source to update contents of the memory hierarchy without having to manage all of the coherency protocols itself. An external logic source or device can be a separate design unit, such as a co-processor (COP) that is on the same chip, or on a different chip with respect to a processor core. An external device can also be shared by multiple processor cores, with it usually handling high speed data processing and manipulation functions based on some predefined algorithm. This can become difficult due to tight timing requirements or due to complex coherency and protection rules. It would be advantageous to provide a way for an external device to write its data into the memory hierarchy or into a particular cache level, and to let existing mechanisms already in place handle the overhead for coherency management and memory protection rules.
Microprocessor architectures may not necessarily allow an external device, such as a COP, to have direct storage update capability. In particular, for a microprocessor that has millicode support (e.g., an IBM eServer z900 processor), the COP does not require, and is not likely to include, any store queue control to update storage directly. This may reduce the overall complexity in the system by taking advantage of built-in millicode capability. In this type of design, the data transfer between the processor and the COP, in general, is under control of millicode, which fetches source operand data from memory and sends it to an input buffer in the COP; reads result data from an output buffer in the COP and stores it to the target operand in memory; and reads and interprets status information from the COP. In this type of design, the result data from the COP is sent to a recovery unit (RU) where it is buffered, and then later read by millicode into a general purpose register (GPR) before being stored out to memory.
Disadvantages of this type of design include that the busing from the COP to the RU may not be desirable due to physical floorplan issues and that a special result buffer is required in the RU to store the COP result data. From a performance point of view, an additional disadvantage is that the millicode always needs to read one result register from the RU to a GPR before it can store the result data out to memory. For a typical COP operation, this may require a stream of READ RUNIT, STORE (up to 8 bytes), and INCREMENT STORE ADDRESS instructions before the COP can continue its operation again without overflowing the RU buffer. The size of the stream depends on the RU buffer size and actual size of currently available COP result.