The present invention relates generally to data transfer and storage technology, and more particularly to masked write operations in memory systems and devices that access memory systems.
Read and write accesses in modern memory systems are typically transacted through transfers of multi-byte blocks of data. When less than a full data block is to be read from the memory system, the address of a data block that encompasses the desired data is provided to the memory system and the full data block is read. Generally speaking, there is no penalty to reading more data than needed. By contrast, when writing a value smaller than a full data block, it is important that the stored data block remain unchanged except for the value written. This is typically accomplished through one of two types of specialized write operations: merged write operations or masked write operations.
In a merged write operation (sometimes called a read-merge-write operation or read-modify-write operation), a memory controller reads the data block to be updated, merges the write data value into the data block at the appropriate offset, then writes the updated data block back to storage. Because two memory accesses are required (read and write), merged write, operations substantially reduce peak bandwidth of the memory system and therefore are typically not used in high performance systems.
In a masked write operation, the memory controller issues mask signals to the storage subsystem to qualify each data value within the data block as being masked or unmasked. The storage subsystem responds by storing only unmasked data values. For legacy purposes, the granularity of data masking typically extends to byte (i.e., eight-bit) granularity. Data masking with eight bit or byte granularity is sometimes referred to as byte-masking. While byte-masking has the disadvantage of requiring additional hardware in the storage subsystem (i.e., to detect and respond to the mask signals), the double-access performance penalty associated with merged write operations is avoided.
FIG. 1 illustrates a prior art byte-masking memory system 100 having a memory controller 101 and a storage subsystem 103. The memory controller 100 includes a host interface 105 to receive access requests (REQ), data blocks (DATA) and mask information (MASK) from an access requestor, and a memory interface 107 to issue corresponding requests, data blocks and mask information to the storage subsystem. In a masked write operation, a 64-byte write data block (512 bits) is received via the host interface 105 along with a 64-bit mask word and a masked-write request. Each bit of the mask word corresponds to a respective byte of the write data block and, if set, indicates that the byte is a masked byte not to be stored within the storage subsystem. The memory controller 101 responds to the masked-write request by issuing a masked-write instruction to the storage subsystem 103 via a request path 102, and by transferring the write data block and mask word to the storage subsystem via a data path 104. The data path includes 32 data lines 108 for parallel transfer of four data bytes and four mask lines 106 for transferring four corresponding mask bits. Consequently, the complete write data block and mask word are transferred to the storage subsystem in a sequence of sixteen data transfers, each transfer including four bytes of the write data block and four bits of the mask word.
The storage subsystem 103 is formed by a number of discrete memory devices, MEMR1, each having a request interface and a data interface. The request interface of each memory device is coupled to the request path 102 to receive the masked-write instruction (including an address value), and the data interface of each memory device is coupled to a respective 9-bit slice of the data path to receive a data byte and corresponding mask bit in each of the sixteen data transfers. For each data transfer, each of the memory devices stores the data byte at a location indicated by the address value (offset according to which of the sixteen data transfers is being acted on) only if the mask bit is not set.
One drawback to the prior-art memory system 100 is that a substantial portion of the data path 104, one line out of every nine, is dedicated to mask signal transfer. Thus, more than 10% of the data path bandwidth is reserved to support byte masking. This bandwidth penalty becomes worse as the device width (i.e., the width of the memory device data interface excluding the mask input) is reduced. For example, if the device width is reduced from eight bits to four bits, then 20% of the data path bandwidth (one out every five signal lines) is reserved for byte masking. Thus, in addition to imposing a substantial bandwidth penalty, the byte masking technique used in the prior-art memory system 100 effectively constrains the device widths of the memory devices within the storage subsystem 103 to be at least eight bits. This device width constraint translates directly into a memory size constraint for a given generation of memory devices and data path width. For example, assuming storage capacity of 512 megabits (Mb) for a given generation of memory devices and a data path width of 32 lines (excluding mask lines), the total size of memory that be coupled in point-to-point fashion to the memory controller is 512 Mb*(32/8)=2 Gigabits (Gb). While an additional group of memory devices may be coupled to the data path 104, as shown in dashed outline in FIG. 1 by devices MEMR2, the additional signal line connections effectively transform the data path 104 into a multi-drop bus. Multi-drop bus arrangements have different, and sometimes significantly less desirable, signaling characteristics than point-to-point arrangements.