1. Technical Field
The invention relates generally to computer systems, and more particularly relates to bus interface logic and protocols for interfacing a 64-bit microprocessor to a 32-bit bus architecture. In even greater particularity, the invention relates to bus interface logic and protocols for implementing replacement cycles.
In an exemplary embodiment, the invention permits a 64-bit x86 microprocessor to be used in 486 computer system with a 32-bit x86 bus architecture.
2. Related Art
Microprocessor-based computer systems include a microprocessor, memory subsystem, and system logic, intercoupled by a local (system) bus. The microprocessor includes an internal L1 (level one) cache that together with the memory subsystem--system memory (DRAM) and, often, external L2 (level two) cache--form a memory hierarchy.
The system logic includes a memory/bus controller that together with the microprocessor implements a bus protocol for transferring data between the microprocessor and the external memory subsystem. In addition, the system logic typically supports access to the memory subsystem by external DMA (direct memory access) devices.
Support for DMA accesses to the memory subsystem may require that the bus protocol include a cache coherency (snooping) protocol--the microprocessor snoops the local address bus to detect DMA accesses to addresses that are in the L1 cache. In particular, if the L1 cache uses a write-back design (such that the L1 cache and the memory subsystem are not maintained coherent because CPU writes that hit in the cache are not automatically written through to the memory subsystem), then a snooped DMA address results in a cache inquiry (look-up) and, if the DMA address hits in the L1 cache and contains modified (dirty) data, the microprocessor runs a snoop write-back cycle to update the memory subsystem prior to the DMA access.
Without limiting the scope of the invention, this background information is provided in the context of a specific problem to which the invention has application: designing a 64-bit microprocessor that can be installed in a 32-bit 486-generation microprocessor using a 32-bit 486 bus architecture, including appropriate bus interface and protocol logic.
A 486 computer system is based on the 486 generation microprocessor which is a 32-bit architecture--its internal data paths are 32 bits or one dword (2 words, 4 bytes). A 486 computer system uses a corresponding 32-bit 486 bus architecture in which the local bus is 32 bits.
The 486 microprocessor's L1 cache is organized in a 4 dword (16 byte) cache line. Cacheable data transfers between the microprocessor and the memory subsystem--fills cycles and replacements/snoop write-backs cycles--are run using burst mode bus cycles in which an entire 4 dword cache line is transferred in 4 successive dword transfers.
In accordance with the conventional burst mode bus protocol, for microprocessor-initiated burst mode bus cycles, the microprocessor outputs an address strobe ADS# ("#" indicating an active-low signal) accompanied by an address and bus cycle definition signals--conventional bus cycle definition signals include W/R# (write/read), D/C# (data/control), and M/IO# (memory/IO). In addition, the microprocessor signals BLAST# (burst last) to indicate that the current bus cycle is (a) noncacheable (BLAST# asserted), (b) burst replacement or snoop write-back (BLAST# negated, W/R#=W), or (c) a potentially cacheable read (BLAST# negates, W/R#=R).
For burst mode bus cycles (including potentially cacheable reads for which the microprocessor negates BLAST#), the memory subsystem returns BRDY# when each of the four bus operations transferring a dword is terminated, with the last BRDY# terminating the burst mode bus cycle. In the case of potentially cacheable reads, the system returns KEN# (cache enable) one clock prior to the first BRDY# to indicate a burst transfer--cacheability (cache line fill) is determined by the state of KEN# one clock prior to the final BRDY#.
In addition to the bus cycle control and definition signals, 486 computer systems use bus arbitration signals to support DMA operations: BOFF#, HOLD, and HLDA.
BOFF# (back-off) is asserted by the system logic to force the microprocessor to abort a current bus cycle (burst or non-burst), and relinquish control of the local bus in the next clock cycle--once BOFF# is negated, the microprocessor restarts any aborted bus cycle in its entirety.
HOLD (bus hold request) is asserted by the system logic to indicate that a DMA device requests control of the local bus to run a DMA access to the memory subsystem--the microprocessor will complete a current bus cycle (burst or non-burst) and then acknowledge the request and relinquish control of the local bus.
HLDA (hold acknowledge) is asserted by the microprocessor in response to HOLD (after a current bus cycle is completed) indicating that it has relinquished control of the local bus for a DMA access--when the system logic negates HOLD, the microprocessor negates HLDA.
In addition to the bus cycle control and definition signals, and bus arbitration signals, 486 computer systems use cache coherency signals to support write-back caching on a 486 microprocessor: AHOLD, EADS#, HIT#, HITM#, and INV. A standard MESI (modified, exclusive, shared, invalid) protocol is used.
AHOLD (address hold request) is asserted by the system logic to cause the microprocessor to tristate the address lines of the local bus one clock after AHOLD while completing the current bus cycle (burst or non-burst). A DMA device performs a cache inquiry cycle by driving an address into the microprocessor at the same time it is presented to the memory subsystem--the microprocessor does not initiate another bus cycle except for a snoop write-back cycle resulting from the cache inquiry.
EADS# (external address strobe) is asserted by the system logic to indicate that a valid cache inquiry address is being driven on the address lines--the microprocessor snoops this inquiry address, and asserts HIT# if the inquiry address is in the L1 cache, and also HITM# if the inquiry address dirty (modified state).
HIT# (hit on cache line) is asserted by the microprocessor in response to a cache inquiry cycle if the snooped inquiryaddress is in the L1 cache (modified, elxlusive, or shared states)--HIT# is valid two clocks after EADS# is sampled active.
HITM# (hit on modified data) is asserted, along with HIT#, by the microprocessor in response to a cache inquiry cycle if the snooped inquiry address that is in the L1 cache is for a cache line that contains any dirty data (i.e., at least one of the 16 bytes is in the modified state)--a snoop write-back (burst mode) cycle is issued to write the cache line back, updating the external memory substytem inpreparation for the DMA access. HITM# is valid two clocks after EADS# is sampled active, and remains asserted until two clocks after the last BRDY# of the snoop write-back cycle is asserted--while HITM# is asserted, the DMA access is stalled.
INV (invalidate request) is asserted by the system logic to determine the final state of the cache line in the case of a cache inquiry hit in the L1 cache--INV is sampled with EADS#: (a) a logic one directs the microprocessor to change the state of the chache line to invalid, and (b) a logic zero directs the microprocessor to change the state of the cache line to shared.
The x86 microprocessor generations after the 486 (at least the 586 and 686) are 64-bit machines. That is, these 64-bit microprocessors use 64-bit (a qword or two dwords) internal buses and a cache organization with a 4 qword (32 byte) cache line.
Computer systems designed for these 64-bit bit microprocessors use a 64-bit local bus architecture, such as the 586 (or Pentium) bus architecture. Cache line fills, replacements, and snoop write-backs (in response to cache inquiries) are performed in a burst mode transfer of 4 qwords. The basic 586 bus protocols, including bus arbitration and cache coherency, are generally the same as the 486 bus architecture--two differences between the 586 and 486 bus protocols are (a) support for pipeline cycles, and (b) the use of CACHE# instead of BLAST# in signaling cacheability.
Bus pipelining is implemented using an additional bus cycle control signal NA#. NA# (next address) is driven by the system a current bus cycle (BRDY#) to request that the microprocessor drive out address/control for the next pending bus cycle request, designated a pipeline bus cycle. NA# is ignored if there is no pending bus cycle request, or if either the current or next bus cycle is a line replacement or snoop write-back cycle.
Regarding cacheability, the microprocessor signals CACHE# (cache cycle indicator) to indicate that the current bus cycle is a potentially cacheable read, or a cache line replacement or snoop write-back. Specifically, if CACHE# is asserted with W/R#=R, then KEN# is sampled to determine if the bus cycle will be a cache line fill. Assetting CACHE# with W/R#=W indictes a replacemtn or snoop write-back (KEN# ignored). Negating CACHE# for either a read or write indicates a non-burst bus cycle (KEN# is ignored).
It would be advantageous if a newer generation 64-bit microprocessor could be used as an upgrade microprocessor for a conventional 486 computer system. However, a 64/32 bit computer system design presents problems in interfacing a 64-bit x86 microprocessor to a 32-bit x86 bus architecture.
For example, the 64-bit microprocessor's L1 cache with a 32 byte line size must be interfaced to a bus architecture in which a burst mode bus cycle only transfers 4 dwords, or 16 bytes (one-half the microprocessor's cache line). This problem can be alleviated by sectoring the cache--each 32 byte cache line can be logically divided into two 16 byte sectors (both corresponding to a 4 dword cache line). Cache line fills and replacements/snoop write-backs can be performed by two successive burst mode transfers of 16 bytes (one sector) each.
Other problems include implementing (a) BOFF# write-back cycles that interrupt cache line replacements, (b) nonchacheable misaligned reads and writes, and (c) potentially cacheable misaligned reads (i.e., reads for which CACHE# is asserted by the microprocessor, which then tests KEN# to determine cacheability).
Regarding combined replacement and snoop write-back cycles, because BOFF# aborts a current bus cycle but may require a snoop write-back cycle in response to a cache inquiry, if the current bus cycle is a cache line replacement, sectoring the cache line may require swapping the order in which sectors of the cache line being replaced are written back. Specifically, if a two sector cache line replacement S0/S1 is staged to be written back in two successive burst cycles (S0, then S1), the following scenario can arise: (a) BOFF# is asserted during the S0 burst cycle, aborting that burst transfer, (b) inquiry cycle hits on S1 such that HITM# is asserted and an S1 snoop write-back cycle is run, and (c) BOFF# is deasserted, and the S0 burst cycle is restarted. In this scenario, the microprocessor should recognize that the restarted S0 burst transfer completes the cache line (S0/S1) replacement because S1 was previously transferred in the snoop write-back cycle.
Regarding misalignments, the x86 architecture does not require data to be memory aligned--data can be addressed as bytes, words, dwords, or for 64-bit systems, qwords--even though the L1 cache is memory aligned, as are internal and external data transfers. Thus, in the case of the 486 generation microprocessor in which internal data transfers and cache accesses are on the basis of 32-bit aligned dwords, addresses to words (two bytes) are not required to be either dword aligned (i.e., the word may span two aligned dwords), or word aligned (i.e., the word may span the two aligned words of an aligned dword)--in the case of noncacheable transfers, one bus cycle is required to transfer a misaligned word that is dword aligned, while two bus cycles are required to transfer a word (or dword) that is dword misaligned.
In the case of a 64-bit microprocessor, cache accesses and internal and external transfers are qword aligned--a misaligned word or dword may be qword aligned, such that a single bus cycle request will access the data, or qword misaligned, such that two qword bus cycles are required. If the 64-bit microprocessor is interfaced to a 32-bit bus architecture, a qword bus cycle request must be converted into two 32-bit (dword) bus cycles--if a word or dword is qword aligned but dword misaligned, then both dwords of the qword must be transferred in successive transfers.
So, in the case of noncacheable misaligned reads and writes, a misaligned word or dword that is qword aligned (which in a 64-bit system would be transferred by transferring the qword) must be converted to two dword bus cycles.
And, in the case of potentially cacheable misaligned reads, the 64-bit microprocessor will initiate a potentially cacheable read cycle by providing a word or dword address that, due to dword misalignment, will be staged as two dword bus cycles. In the first bus cycle, the bus interface unit will drive out the first dword address (along with the appropriate bus cycle definition signals), and will assert CACHE#, indicating that the read is potentially cacheable. If KEN# is returned, signaling a cacheable burst read cycle, the bus interface unit must recognize that the second dword will be transferred as part of the burst, and invalidate the staged bus cycle for the second dword.