FIGS. 2-11 show point to point cache coherent switch solution for multiprocessor systems that are the subject of copending and coassigned applications.
Depending on the implementation specifics, these designs may be problematic in two respects:
1. Tag SRAM size is expensive
2. Latency is greater than desired
First, SRAM Size Issue:
To support L2 size=4 MB, total 64 GB memory and 64 byte line size
the TAG array entry will be 4 MB/64 Byte=64K entries
the TAG size will be 14 bits
The total TAG array size=14 bits *64K=917,504 bit/per CPU
To support 8-way system, a duplicated TAG array size will be 8*14 bits *64Kxe2x80x94about 8M bit SRAM.
8 Mbit SRAM is too large for single silicon integrait even with 0.25 micron CMOS process.
Second, Latency Issue:
Although the switch fabric solutions of FIGS. 2-11 provide scalability in memory throughput, maximum transaction parallelism, and easy PCB broad routing, the latency for memory read transactions is greater than desired.
Example for Memory Read Transactions:
CPU read transaction will first latched by CCU, CCU format transaction into channel command, CCU will send the transaction through channel, FCU""s IIF unit will de-serialize the channel command or data and perform cache coherency operation, then FCU will send the memory read transaction to MCU. MCU will de-serialize the channel command, send the read command to DRAM address bus, MCU read from DRAM data bus, send the data to FCU via channel, FCU will send data to CCU via channel. Finally the data is presented at CPU bus. A transaction for read crosses the channel four times. Each crossing introduces additional latency. What is needed is an SMP architecture with the benefits of the present FCU architecture, but with reduced Tag SRAM size requirements per chip and with reduced latencies.
Fully connected multiple FCU-based architectures reduce requirements for Tag SRAM size and memory read latencies. A preferred embodiment of a symmetric multiprocessor system includes a switched fabric (switch matrix) for data transfers that provides multiple concurrent buses that enable greatly increased bandwidth between processors and shared memory. A high-speed point-to-point Channel couples command initiators and memory with the switch matrix and with I/O subsystems.