1. Field of the Invention
The present invention relates to a main memory of a computer system, and more particularly, a technique for an atomically reading data from, and writing data to, the main memory.
2. Description of the Prior Art
Data is conventionally stored in a computer memory in a unit of data known as a word. A traditional computer system updates the memory with a quantity of data that is related to the natural width of a word of the memory. That is, the size of the update is related to the width of the word. For example, in a particular reduced instruction set computer (RISC), the general-purpose registers are 64-bits wide, and thus the RISC machine allows writing of 64-bits of data.
An atomic data transfer is one in which an entire block of data is read from a memory to a first processor, or written from the first processor to the memory, as a unit, without interference from a second processor. That is, all bytes of the data are transferred between the first processor and the memory without interference from the second processor. Traditional architectures allow a transfer of a quantity of data greater than that of the natural width, but such a transfer is not guaranteed to be atomic.
The prior art technique for attempting to ensure an atomic transfer of data is for a processor to acquire xe2x80x9ca lockxe2x80x9d on a memory. This is achieved by executing three transactions between the processor and a memory controller for the memory. The first transaction is a command from the processor that sets a lock indicator, i.e., a flag, and an address to which the data is to be written or from which the data is to be read. The quantity of data to be transferred is of a predetermined block size. The second transaction is the transmission of the data between the processor and the memory controller. The third transaction releases the lock to allow other processors to access the lock.
Even if a prior art memory system permits an atomic access thereof, it is not possible for an instruction stream to control the atomic transfer. This prior art is fully effective only if all programs that are executed by all processors that access the memory are written to honor the lock. That is, a program that fails to honor the lock can interfere with an in-progress transfer of data. Also, because the quantity of data to be written is of a predetermined block size, this technique offers no flexibility in the size of the transfer.
A traditional system cannot perform an atomic transfer from the instruction stream because instruction sets historically did not provide atomic transfer instructions nor were memory systems with cache subsystems capable of atomic transfers of greater than one word. Processors have previously not provided unconstrained multi-word atomic update instructions because it is costly in hardware and lacks scalability. That is, as more processors are added to a system, processing efficiency is adversely impacted.
It is an object of the present invention to provide a technique for enabling an atomic transfer of data between a processor to a memory.
It is another object of the present invention to enable such a transfer while permitting a flexible data block size.
These and other objects of the present invention are achieved by a method for transferring data between a processor that includes a cache and a memory comprising the steps of (A) executing, at the processor, an instruction that includes (i) a specifier of a location in a storage resource local to the processor, (ii) a specifier of an address in the memory, and (iii) a specifier of a size of a data block, (B) providing, from the processor to a controller, a set of control signals indicating (i) the address in the memory, and (ii) the size of the data block; and (C) transferring, by the controller, in response to receipt of the set of control signals, the data block atomically between the storage resource and the memory, without the processor having to first request a lock on the memory. The method is constrained to operations where the size of the data block is less than or equal to one cache-line size, the address in the memory is naturally aligned, and the memory is updated by a cache-line sized operation.