Multi-processor computer systems typically use a directory based cache. This type of cache stores cache tag information in memory along with the data information, such that whenever a read or write operation is performed in the memory, the tag is examined to determine the current state of the data information in terms of which processor is owner of the data or which processor is the last processor to update the data. Based on the type of requested memory operation, the tag is examined to determine what other events must happen, such as notifying a processor that another processor is now sharing data or that the information that the processor has in a local cache is now invalid because new data has been written. These type of events may have to occur before, or simultaneous with, the requested memory operation.
When memory is read or written, the current tag always has to be read and then updated, based on the requested memory operation, and the new updated tag has to be written into memory. These steps occur in addition to the reading or writing of the data information. The typical multi-processor system may use synchronous DRAM (SDRAM) memory, which is essentially similar to the traditional DRAM memory but includes synchronization registers that provide a simpler and more efficient interface to the memory controller.
The current generation of SDRAMs can be configured to work either in burst mode or in a single byte mode but not both modes at the same time. In a single byte mode or a byte-by-byte mode write operation, only one byte is written in each operation into each chip. In a single byte mode read operation, only one byte is read in each operation from each chip. Reading a single RAM chip provides one byte of data or 8 bits of data. Each single mode operation reads/writes a single byte from each chip in a memory module. This line of data is called a memory line. A cache line may consist of multiple memory lines. In single byte mode operations, each read or write command requires the full overhead of the command set and many single mode operations are required to operate on the full cache line.
In the burst mode, one read or write command is issued and multiple memory lines are operated on, each line in a successive clock cycle. While the overall burst operation is longer than a single mode operation, it is more efficient when operating on the entire cache line as only a single set of overhead commands is needed for a burst operation.
The basic operation of the SDRAM involves three phases of a command. The first phase is activate. The second and third phases are read/write commands and precharge command, respectively. The memory controller issues an activate command and provides a row address. Then the memory controller issues a read or write command, along with a column address. After the read or write phase is completed, the memory controller issues a precharge command which resets the RAM back to an idle state, ready for the next memory operation. In the read operation, activation occurs after a required delay to separate this operation from the previous operation, and then a read command is issued. Then there is an internal delay while the RAM gets the data and outputs it. This delay is a programmable feature called CAS latency. CAS stands for Column Address Strobe, and RAS is Row Address Strobe. These terms are traditional DRAM terms but they are also used with the SDRAM. Output of the first memory line begins, depending on CAS latency, after the read command is issued. In a burst operation, the second, third and fourth memory lines are output on the next successive clock cycles. There is a wait period called a recovery period before another command can be issued and a precharge is issued to complete the operation.
A write operation involves the activate phase, then the write command with column address is issued. If this is a burst operation, then the first memory line of data are presented along with the second, third and fourth memory lines on successive cycles.
In reading and writing the tag data, the same location of memory is being operated on, so the RAMs allow a read command and then a write command without issuing another precharge command and another activate command. This sequence would be activate, read, wait for the read data, issue the write command with the write data, then issue the precharge command to complete the sequence.
SDRAMs also have a control signal that allows for data masking, or the disabling of the data outputs, which tells the chip to disregard the data being written. This signal is called DQM. The basic DQM operation is to disable the output. After assertion of DQM, there is a latency similar to the read latency or the CAS latency. Then some clocks later the output is disabled. On a write, if DQM is asserted, the write data that would have been clocked in on that clock cycle is disregarded.
A sequence of phases of a read or write command controls the access to a RAM. First, there is an activate phase, next there is a read/write command phase, then last there is a precharge phase. The only phase where data transfer occurs is the read/write command phase. The activate phase and the precharge phase are overhead. Also, there are time delays, such as CAS or RAS accessing the RAM, and this is also part of the overhead. A burst operation mode allows more data transactions for the same overhead. For example, if a single read or write operation is being performed, there would be two clocks for activate, a clock for read, another clock for retrieving the data, another clock for the actual data, a dead clock while waiting for the precharge command, a precharge clock, then another clock before the next activate command. Thus, out of eight clock cycles, only one cycle has data actually moving. Similarly, on a write operation, there would be only one of about six cycles where data is actually transferring. A burst type operation allows for multiple data operations for the same overhead delay cost of one single mode operation. For example, the activate command would be issued, then the read command and with the read access delay on successive clocks, each memory line of a cache line would be read out of the SDRAM. Thus, all the cache line data is transferred for the same overhead, resulting in greater efficiency.
The main problem with a burst operation is that even though only a part of cache line needs to be read or written, the entire cycle must occur. If it is a read operation, then the entire operation is delayed in waiting for the rest of the burst to be read out. If it is a write operation, all of the data has to be rewritten to complete the full burst. If performing a single read or write, with a burst mode device, the overhead is actually worse than the overhead would be for a single burst or single byte operation.
There are three basic types of memory operation cycles in multiprocessor systems: read, write and read-modify-write. A read operation involves reading the tag and the data, and then updating the tag, and writing the new tag information back to the memory. A write operation involves reading the tag, updating the tag, and writing the new tag and the data back to memory. A read-modify-write operation involves, reading the tag and the data, optionally updating the data, updating the tag, and then writing the tag and the data back to memory. In the type of multi-processor system having tags in memory that have to be kept up-to-date, every memory access is effectively a read-modify-write. The only issue is how much of the data is read and how much of the data is written. The simplest approach is to make every cycle a read-modify-write.
FIG. 5 shows the sequence of events for a read-modify-write cycle. There is an activate 510 and an idle 511 or a dead clock period while the memory prepares for the next command. Then there is the read command 520, another dead clock period 521, then four cycles of read data 522 being transferred. A dead period 523 for recovery from the read, and another period 524 for waiting for propagation through the data bus pipeline. Next a write command 530 is issued with four cycles of data 531, which is followed by another dead clock period 532. Lastly, the precharge 540 is issued to prepare for the next cycle, followed by another dead period 541. Then the device is ready for the next activate 550. This total time period is seventeen clock periods from one activate 510 to the next activate 550. In previous systems that used traditional DRAMs, it was possible to do a burst read effectively by doing a column access and write the tag, or read only the tag and do the write of new a tag and data, without having to do a full burst read and a full burst write. But because of synchronous DRAMs, which greatly simplifies the interface with the memory controller, the problem of having to do full burst read-modify-write cycles exists, whereas the previous systems do not have that problem.
In summary, the problem to be solved is when using a burst mode with a synchronous DRAM, a full burst read-modify-write operation of all the memory lines must be performed whenever a read or write is necessary. For example, if the system has four memory lines and a read is desired, three clocks are wasted writing three memory lines. (The memory line with the tag must be updated). Similarly for a write, three clock periods are wasted waiting for the rest of the read burst when the data is just going to be over written. (The memory line with the tag must be read).
It is therefore desirable to design a system using a SDRAM which operates in the burst mode, but which is efficient in situations where less than all of the memory lines in a block must be accessed during one clock cycle.