This invention relates in general to the field of store data write-combining in a microprocessor, and more particularly to using tags on data to write-combine rather than memory addresses.
A typical computer system includes a microprocessor that writes, or stores, data to memory devices in the system, such as system memory or video frame buffers. The microprocessor is connected to the memory devices by a processor bus. Store instructions of software programs executing on the processor generate write transactions on the processor bus to write data to memory. In some circumstances it is desirable to delay writing the data of a store instruction on the processor bus to memory, and instead to buffer the data and combine it with data from one or more subsequent adjacent store instructions into a single write operation on the processor bus to memory. This operation is commonly referred to as write-combining.
An example of a circumstance in which write-combining is desirable is a video game program that updates the screen based on user input. The screen update portion of the program generates a large number of stores of video data to sequential addresses of a frame buffer in a video controller of the system. Because the stores of the video data are sequential, they may be buffered and combined, resulting in a fewer number of write transactions on the processor bus. A major advantage of write-combining is that it enables more efficient use of the processor bus as follows.
A typical write transaction on a processor bus to memory comprises an arbitration phase, address phase, one or more data phases, error phase, and completion phase. Assume the microprocessor has two stores of data to be written to memory. If the microprocessor performs the two stores as two separate write transactions on the bus, then it must perform two sets of arbitration, address, error, and completion phases in addition to the data phases. In contrast, if the microprocessor buffers the data from the first store and combines the second store with the first into one write transaction on the processor bus, the microprocessor only performs one set of arbitration, address, error, and completion phases. Additionally, some of the data phases may also potentially be eliminated depending upon the relationship between the size of the data being written and the width of the processor data bus. The avoidance of the redundant phases by write-combining is a more efficient use of the processor bus and potentially improves the write-throughput to the memory.
A typical processor bus enables the microprocessor to perform bus transactions to write a block of data varying in size from a single byte to an entire cache line. A common cache line size is 32 bytes of data. However, a typical physical data bus width of the processor bus is eight bytes. Assume the microprocessor executes four adjacent eight-byte stores of data aligned on eight-byte address boundaries. If the microprocessor performs four separate write transactions on the bus, then four data phases will be performed on the busxe2x80x94one for each of the transactions. However, if the microprocessor combines the stores into a single write transaction, then the four data phases will be performed within a single transaction, eliminating the redundant non-data phases, and thereby using the processor bus more efficiently.
Furthermore, in the case of a screen update to a video frame buffer, hundreds of write-combined cache line write transactions may potentially be performed on the processor bus to sequential addresses in the frame buffer. In this case, the microprocessor may pipeline many such cache line write transactions on the processor bus thereby effectively eliminating even the non-data phases from an efficiency perspective in order to achieve close to the maximum write throughput possible on the processor bus, thereby updating the frame buffer very quickly.
In order to determine whether it is possible to write-combine a new write with pending writes, the microprocessor must compare the address of the new write with the addresses of pending writes to determine whether the new data may be merged with, or overwrite, the pending data. It is typically advantageous to the performance of the microprocessor to perform the comparison of the addresses and the loading of the new write data into a write buffer holding the pending data in the same microprocessor clock cycle.
Typically, the number of address bits that must be compared is relatively large: on the order of 32 bits. Consequently, performing a comparison of the new write address with the pending write addresses, deciding whether it is possible to write-combine, and muxing the data into the write buffer to perform the write-combine may take a relatively long time and create processor cycle timing problems. That is, the logic to perform the write-combining may become the critical timing path of the microprocessor, thereby reducing the processor cycle time. This is an undesirable consequence.
In addition, commonly the sources of the addresses of the new write data and the pending write data are located spatially a relatively large distance from the write buffers where the data is to be combined. This fact may also contribute to the write-combine logic becoming the critical timing path.
Therefore, what is needed is a means of reducing the time required to determine whether a write-combine may be performed.
The present invention provides an apparatus and method for reducing the time required to determine whether store data may be combined in a write buffer with existing store data by making the determination based on a tag comparison rather than an address comparison. Accordingly, in attainment of the aforementioned object, it is a feature of the present invention to provide a write-combining apparatus in a microprocessor coupled to a memory by a bus, the microprocessor having write buffers for buffering data to be written to the memory on the bus. The apparatus includes tag allocation logic that allocates tags to store data associated with store instructions executing in the microprocessor. The tag allocation logic allocates the tags based on a comparison of bus addresses of the store data. The apparatus also includes a plurality of tag comparators, coupled to the tag allocation logic, which compare the tags allocated by the tag allocation logic. The apparatus also includes write buffer control logic, coupled to the plurality of tag comparators, that determines which of the write buffers to load the store data into based on the comparing of the tags by the plurality of tag comparators.
In another aspect, it is a feature of the present invention to provide an apparatus in a processor for determining whether first data of a first store operation may be combined with second data of a second store operation into a single write operation on a bus coupled to the processor. The first and second data have first and second addresses, respectively. The apparatus includes tag allocation logic that allocates a first tag to the first data based on a comparison of the first address with the second address. The apparatus also includes a tag comparator, coupled to the tag allocation logic, which compares the first tag with a second tag previously allocated by the tag allocation logic to the second data. The apparatus also includes write-combine logic, coupled to the tag comparator, which determines whether the first and second data may be combined based on the tag comparator comparing the first and second tags.
In another aspect, it is a feature of the present invention to provide a write-combining processor. The processor includes a plurality of write buffers that buffer an associated plurality of data for writing to memory on a bus coupled to the processor. The processor also includes a plurality of tag registers, coupled to the plurality of write buffers, which store a plurality of tags associated with the plurality of data. The processor also includes a store buffer, coupled to the plurality of write buffers, which buffers store data waiting to be written to a cache of the processor. The processor also includes a tag register, coupled to the store buffer, which stores a tag associated with the store data. The processor also includes a plurality of tag comparators, coupled to the plurality of tag registers, which compare the tag with the plurality of tags. The processor selectively combines the store data with one of the plurality of data buffered in the plurality of write buffers prior to writing to memory on the bus based on the comparing by the plurality of tag comparators.
In another aspect, it is a feature of the present invention to provide a method for performing write-combining in a microprocessor having a pipeline of stages. The method includes generating a first tag for first store data, comparing a first bus address of the first store data with a second bus address of second store data after the generating the first tag, and generating a second tag for second store data in response to the comparing the first and second bus addresses. The method also includes comparing the first and second tags, and selectively combining the first and second data based on the comparing the first and second tags.
An advantage of the present invention is that it potentially reduces the processor cycle time in a write-combining microprocessor by reducing the likelihood that the write-combining logic is the critical path. The invention reduces the likelihood that the write-combining logic is the critical path by comparing substantially an order of magnitude fewer bits to perform the write-combine and by potentially reducing the propagation delay time of the comparison result signal. The propagation delay time reduction is accomplished by locating the tag comparison logic spatially closer to the write-combine logic, thereby reducing the comparison result signal lengths. Further, the advantage is achieved by adding a relatively small amount of additional logic.
Other features and advantages of the present invention will become apparent upon study of the remaining portions of the specification and drawings.