1. Technical Field
The invention relates generally to computer systems, and more particularly relates to external bus protocols for transfers between a microprocessor and an external memory.
In an exemplary embodiment, the invention is used in 586 computer system with an external level two cache that supports both burst and non-burst transfers between the 586-class microprocessor and the memory subsystem.
2. Related Art
Microprocessor-based computer systems include a microprocessor, memory subsystem, and system logic, intercoupled by a local (system) bus. The microprocessor includes an internal L1 level one) cache that together with the memory subsystem--system memory (DRAM) and, often, external L2 (level two) cache--form a memory hierarchy.
The system logic includes a memory/bus controller that together with the microprocessor implements a bus protocol for transferring data between the microprocessor and the external memory subsystem. If a CPU access (read or write) misses in the L1 cache, the microprocessor runs an external bus cycle to access the memory subsystem. The external access will be serviced by the L2 cache, or if that access misses, the system DRAM.
Without limiting the scope of the invention, this background information is provided in the context of a specific problem to which the invention has application: in x86 computer systems, improving performance on external accesses to an L2 cache.
A conventional 586 computer system uses 64 bit internal and external data buses able to transfer 8 bytes (two dwords or one qword) at a time. The internal L1 cache uses a 32 byte (4 qword) line size, such that cache line fills (reads) and replacements (writes) require the four 64-bit (qword) transfers between the microprocessor and the memory subsystem (L2 cache or system DRAM).
According to the conventional 586 bus architecture and protocol, external bus cycle transfers between the microprocessor and the memory subsystem occur in either burst or non-burst mode. Burst mode bus cycles transfer in sequence the four 4 qwords of an L1 cache--line fills, replacements, or snoop write-backs in response to cache inquiries during DMA (direct memory access) operations. In addition, some microprocessors support write gathering in which writes to the contiguous bytes of a cache line are gathered in internal write buffers and then written out to the memory subsystem in burst mode. Non-burst mode bus cycles are used to transfer (read/write) 1 to 8 bytes at a time in a single bus transfer.
The microprocessor initiates an external bus cycle with an address strobe ADS# ("#" indicating an active-low signal) accompanied by an address and bus cycle definition signals--conventional bus cycle definition signals include W/R# (write/read), D/C# (data/control, and M/IO# (memory/IO). In addition, the microprocessor will signal CACHE# (cache cycle indicator) if the current bus cycle is, for a read, potentially cacheable, or for a write, a cache line write back or replacement.
The memory subsystem returns BRDY# when the current transfer is complete. For non-burst transfers, a single BRDY# is returned, completing the bus cycle. For burst transfers, each of the four qword transfers is completed by a BRDY#, with the last BRDY# completing the burst cycle.
The 586 bus architecture supports pipelined bus cycles. The bus cycle control signal NA# (next address) is driven by the system during a current bus cycle (before the last BRDY# has been returned) to request that the microprocessor drive out address/control for the next pending bus cycle request, designated a pipeline bus cycle. NA# is ignored if there is no pending bus cycle request, or if either the current or next bus cycle is a line replacement or snoop write-back cycle.
Whether an external bus cycle is a burst or non-burst transfer is determined by the microprocessor CACHE# and W/R# bus cycle definition signals, and the system KEN# (cache enable) signal. If CACHE# is asserted for a read cycle, and the system returns KEN#, then the read is converted to a burst fill cycle. Asserting CACHE# for a write cycle indicates a cache line replacement or snoop write-back (or, possibly, a gathered write).
The system returns KEN# in the same clock as BRDY#, or for pipeline bus cycles, in the same clock as BRDY# or NA# is sampled active (whichever occurs first). Thus, for potentially cacheable reads, the microprocessor will sample KEN# contemporaneously with the first assertion of BRDY# or NA#.
After asserting ADS#, the microprocessor samples BRDY#(and, for reads, KEN# ) in the second clock cycle after driving out ADS#, so that the first transfer takes at least two clock cycles. If the address hits in the L2 cache, the bus transfer will be completed and BRDY# returned in this clock cycle.
In current memory subsystems, L2 caches are able to complete a bus transfer within two clock cycles, while system DRAM typically takes 4 clock cycles to complete the first transfer. For burst transfers, the L2 cache is able to complete the remaining three transfers of the burst in 1 clock cycle each (2-1-1-1), while system DRAM typically takes 2 clock cycles for each of the remaining transfers (4-2-2-2).
Computer system performance could be improved by reducing the number of clock cycles to complete a non-burst cycle, or to complete the first transfer of a burst cycle. In particular, current L2 cache performance is such that, depending on speed of the local bus, the L2 cache would be able to detect ADS#, perform cache look-up, and signal a hit in the same clock cycle as the ADS# strobe.