1. Field of the Invention
The present invention relates to microprocessors, and more particularly to an apparatus and method for sparse line write transactions which solve the problem of slow writes to memory when sparse portions of contiguous write-combined memory space have been modified.
2. Description of the Related Art
Write combines and non-temporal store operations are not kept in the microprocessor but instead are written out to the memory bus. In a present day quad-pumped bus, such as is exhibited by most x86-compatible microprocessors, data transfers to memory are performed either on a cache line basis (i.e., eight quadwords for a 64-byte cache line) or on an individual quadword basis. When transferring an entire cache line, two clock cycles are required to transfer the eight associated quadwords and four quadwords are transferred during each cycle of the bus clock, thus accounting for the descriptor “quad-pumped.” During this type of transfer, the entire 64-bytes are written to the bus; there is no mechanism to only write part of a cache line to memory. If only part of a cache line is to be written to memory, then a different type of data transfer must be employed, which allows for transfer of an individual quadword and, as part of the bus protocol, byte enable signals are set to indicate specific bytes within the transferred quadword that are to be written to the memory. Individual quadword transfers take one bus clock cycle. In this manner, the state of the art allows for either 64 contiguous bytes to be written to memory in two clock cycles or for a single quadword to be written in a single clock cycle.
In reviewing present day microprocessor bus architectures and associated protocols, in conjunction with observations concerning how contiguous memory spaces are manipulated by application programs, the present inventor has noted that the bus protocols associated with writes of data to the memory bus, as alluded to above, are disadvantageous when sparse data within a contiguous memory space has been modified and is to be written to the bus. For example, it is common to modify every other doubleword (four bytes) within a video buffer to change some display property. Conventional microprocessors, however, do not provide a mechanism for selecting data that is to be written to memory at any granularity other than byte granularity on a quadword-by-quadword basis. A sparse write of contiguous memory is thus set up to be written to the bus and such a write is limited to individual quadword transfers.
Because the data associated with combined writes (e.g., write combines, non-temporal stores) is typically large, it is disadvantageous to not fully utilize the bandwidth of a data bus, whether that bus is quad-pumped or otherwise. Since data buses typically operate at clock speeds many times slower than that of microprocessor core clocks, it is crucial to execute combined writes to memory with optimum efficiency. It is therefore desirable to be able to write an entire cache line to memory where individual doublewords within that cache line can be enabled.
A microprocessor according to an embodiment of the present invention includes processor logic and sparse write logic. The processor logic asserts address signals and request signals to provide an address and a request for a cache line memory write transaction. The sparse write logic causes the processor logic to specify a sparse write-combined memory write transaction on the request signals and to provide doubleword enable bits on the address signals. The processor logic asserts a first part on the address and request signals to provide the address and the request for the cache line memory write transaction, and asserts a second part on the address and request signals to specify the sparse write-combined memory write transaction and to provide the enable bits. The sparse write logic causes the processor logic to replace an attribute value and byte enable bits on the address signals of the second part with the doubleword enable bits.
The sparse write logic may cause the processor logic to provide a sparse write command value on the request signals and to provide the doubleword enable bits on the address signals during the second part. The sparse write-combined memory write transaction may be a quad-pumped cache line write transaction for writing eight quadwords. Each doubleword enable bit may identify a corresponding doubleword of the eight quadwords.
A processor bus system according to an embodiment of the present invention includes a processor bus coupled to a processor and a bus agent. The processor bus includes address signals, data signals, and request signals. The processor controls the address signals and the request signals to request a sparse write-combined memory write transaction and doubleword enable bits, and controls the data signals to provide data for the sparse write-combined memory write transaction. The bus agent writes portions of the data to a memory location selected by the enable bits. The processor asserts first part including an address and a memory line write request on the address signals and the request signals, respectively, and may assert a second part including a sparse write-combined memory write transaction request and the doubleword enable bits on the address signals and the request signals, respectively. The processor replaces an attribute value and byte enable bits on the address signals of the second part with the plurality of doubleword enable bits.
The sparse write-combined memory write transaction request may be a sparse write command value asserted on the request signals. The enable bits may be asserted on concatenated fields of the address signals. The sparse write-combined memory write transaction may be a quad-pumped cache line write transaction with eight quadwords. The bus agent may write selected doublewords of data to the memory location according to the doubleword enable bits.
A method of performing a sparse write-combined write transaction according to an embodiment of the present invention includes providing, by a processor, an address and a request for a memory write transaction, where the memory write transaction includes sending an entire cache line to memory, and where individual data elements within the cache line are enabled to be written to the memory with doubleword granularity; indicating, by the processor, that the memory write transaction is a sparse write-combined write transaction; and providing, by the processor, data for the sparse write-combined write transaction. The indicating includes asserting, by the processor, a first transaction part including an address and a request for the memory write transaction; replacing an attribute value and byte enable bits on address signals of a second transaction part with a plurality of doubleword enable bits; and asserting, by the processor, the second transaction part including a sparse line write command and the plurality of doubleword enable bits, wherein the plurality of doubleword enable bits determines which doublewords of the cache line are to be written to the memory.
The memory may include providing a cache line of eight quadwords. The method may further include receiving, by a bus agent, the address and the request for a memory write transaction, detecting that the memory write transaction is a sparse write-combined write transaction, receiving, by the bus agent, the doubleword enable bits, receiving, by the bus agent, the data, and writing portions of the data based on the doubleword enable bits to a memory location indicated by the address. The data may be eight quadwords of data.