This invention relates to computer-network switches, and more particularly for store-and-forward network switches using embedded memory.
Computer networks have been a key technology to unlock the power of low-cost personal computers (PCs). Networks allow PCs and work stations to share data and resources and facilitate communication among co-workers. While individual PCs may have limited resources, by linking a PC to a network, vast additional resources become easily available.
Computers, printers, and other network elements are often connected together at a lowest level by a local-area network (LAN) such as an Ethernet. FIG. 1A illustrates a prior-art LAN using a single collision domain. Network elements 12 include PCs and peripherals such as a workgroup printer. Each network element 12 contains a network-interface card (NIC) or equivalent that connects to the physical media, usually twisted-pair cable. While LAN 10 is often depicted as a single cable, often a repeater or hub is used. The cable from each network element 12 is plugged into the repeater box (not shown). The repeater then replicates a signal transmitted from one network element 12 to all other network elements 12 so that all network elements 12 receive the same transmission.
Sometimes two network elements 12 transmit at the same time. A collision results where the two transmissions interfere with each other and the correct data cannot be reliably read. Transmission must stop and be re-started at different times. These collisions become more frequent as more network elements 12 are added to LAN 10, and as network traffic increases.
Since each transmission is repeated to all other network elements 12 on LAN 10, LAN 10 contains a single collision domain. Performance of LAN 10 is limited by collisions. Often a corporate network must be divided into many small LANs to limit collisions. Bridge 14 or a router is used to connect LAN 10 to other LANs or to a backbone network or wide-area network (WAN).
Network Switchxe2x80x94FIG. 1B
More recently, network switches have been used to connect LANs to other LANs, and even to connect network elements within a single LAN. FIG. 1B shows a LAN using a network switch to avoid large collision domains. Network elements 12 connect to switch 16 rather than to a repeater or hub. Each network element 12 is bi-directional (full-duplex) and is shown twice, once as an input to switch 16 and again as an output from switch 16, but in reality each network element is a single element.
Switch 16 monitors the inputs from each network element 12, looking for a transmission. When a transmission to switch 16 is detected, the packet is parsed to determine the destination port. Switch 16 then makes a temporary connection from the input port to the output port and sends the packet to the destination port. The packet is sent only to the destination port and not to the other output ports.
Switch 16 contains a switch fabric that allows multiple connections to be made at the same time. For example, input port (network element) A can send a packet to output port C, while input port B sends a packet to output port D. A collision does not occur even though both network elements A and B are transmitting at the same time. Switch 16 effectively creates separate collision domains for each pair of network elements during a packet transmission.
Switch 16 can eliminate collisions, or break a network into smaller collision domains when entire LANs rather than single network elements are connected to the ports of switch 16. Thus switches have become immensely popular as a way to easily improve network performance.
The original 10 Mbps Ethernet (10 Base-T) is being replaced with 100 Mbps Ethernet, and 1 Gbps Ethernet is expected. The higher speed networks require that the switches also transfer data at the higher rate to avoid congestion. The switches must also set up and tear down connections more rapidly.
Often the switch fabric in network switches consists of a crossbar switch, which can simply be a matrix of transistor switches that connect each input port to every other output port. These crossbar switches are fast, but as more ports are added, the switch matrix grows exponentially as each new port must connect to all other ports. Thus switches with larger numbers of ports are expensive.
S/F Switch Limited by Memory Bandwidth
Switches can also use a store-and-forward architecture. Incoming packets are stored in a memory and later read out to the output port. No switch matrix or crossbar switch is required. These store-and-forward switches can more easily add ports. However, memory bandwidth limits the speed and number of ports of these switches.
For example, the maximum throughput for a store-and-forward switch is when one half of the ports (n/2) are talking to the other half of the ports at full speed. The network speed is V (Mbps) per direction, the direction number is D (1=half duplex, 2=full duplex), and the number of memory access cycles A is 2, one cycle to write and one cycle to read. The required memory bandwidth S (Mbps) is:
S=(n/2)*D*V*A Mbps
For example when n=8 ports, V=100 Mbps, D=2 (full duplex), A=2 cycles, then:                     S        =                              (                          8              /              2                        )                    *          2          *          100          *          2                                        =                  1          ,          600          ⁢                      xe2x80x83                    ⁢          Mbps                                        =                  1.6          ⁢                      xe2x80x83                    ⁢          Gbps                                        =                  200          ⁢                      xe2x80x83                    ⁢          M          ⁢                      xe2x80x83                    ⁢          bytes          ⁢                      /                    ⁢                      sec            .                              
The memory must have a bandwidth of 200 M bytes/sec for a non-blocking switch with just 8 ports. In another example, a switch has 24 100 Mpbs ports and 2 Gbps ports. Then n=24 ports with V=100 Mbps, and n=2 ports with V=1 Gbps. D=2 for full duplex, A=2, then:                     S        =                                            (                              24                /                2                            )                        *            2            *            100            *            2                    +                                    (                              2                /                2                            )                        *            2            *            1000            *            2                                                  =                  4800          +                      4000            ⁢                          xe2x80x83                        ⁢            Mbps                                                  =                  8.8          ⁢                      xe2x80x83                    ⁢          Gbps                                        =                  1.1          ⁢                      xe2x80x83                    ⁢          Gbytes          ⁢                      /                    ⁢          sec                    
A very fast memory with a 10-nanosecond access time must have a data-bus width of nearly 100 bits to meet such a bandwidth requirement. Including ground and control pins, a switch-controller integrated circuit (IC) or chip would require 150 pins just for the memory interface. The 26 ports would require another 150 pins. The switch-controller chip could easily surpass 400 pins, making it very expensive. Using multiple chips further increases cost, power dissipation, and complexity of the switch system.
As network speeds increase to 100 MBps and especially 1 Gbps and beyond, store-and-forward network switches face severe technical limitations. Expensive static random-access memory (SRAM) has fast access, but larger packets require larger memory sizes. Slower dynamic-random-access memory (DRAM) may be used to store larger packets, but it may not offer fast access times and sufficient bandwidth. Increasing bandwidth requirements for faster network speeds and additional switch ports increases pincount and cost of switch controller chips.
Embedded-DRAM Graphics Display Systems
The assignee has recognized the problem of bottlenecks to external dynamic-random-access memory (DRAM) in graphics display systems, and has pioneered embedded DRAM for graphics controllers. See for example: Puar et al., xe2x80x9cGraphics Controller Integrated Circuit Without Memory Interfacexe2x80x9d, U.S. Pat. Nos. 5,650,955 and 5,703,806. These embedded-DRAM graphics controllers have been used predominantly for portable PC""s such as laptop and notebook PCs.
Although graphics controllers are in a different technical field than network switches, the inventor has realized that such embedded DRAM technology could solve performance and cost problems for network switches. While many view embedded DRAM technology as useful only for portable systems, the inventor realizes that computer-network switches and routers could benefit from the performance and cost improvement of embedded DRAM.
What is desired is a network switch for higher network speeds. It is desired to add ports without significantly increasing the cost of the switch. A network switch with a large number of high-speed ports is desired. It is desired to avoid increasingly larger numbers of pins on a network-switch chip as higher network speeds are used and higher s memory bandwidth is required. A high memory-bandwidth network switch is desired. A store-and-forward network switch with sufficient memory bandwidth and high port count for Gigabit networks is desired.
A network switch chip has a plurality of input ports connected to external network nodes to receive packets. A plurality of output ports is connected to external network nodes. The output ports transmit the packets to the external network nodes.
An embedded packet memory temporarily stores packets received from the input ports for transfer to the output ports for transmission. The embedded packet memory is a dynamic-random-access memory (DRAM) organized into rows and columns.
An internal data bus is coupled to the embedded packet memory. It writes packets from the input ports to the embedded packet memory and reads the packets from the embedded packet memory to the output ports. The internal data bus is an internal bus without connection to external pins of the network switch chip during normal operating modes when packets are transmitted from input ports to output ports.
Port controllers are coupled to the input ports and the output ports. They detect packets received at an input port and write the packet into the embedded packet memory over the internal data bus. They also read the packet from the embedded packet memory to an output port.
A message bus is coupled to the port controllers. It sends a message from a port controller for an input port that receives a packet to a port controller for an output port that is a destination indicated by the packet. The message causes the port controller for the output port to read the packet from the embedded packet memory. Thus packets are switched from an input port to an output port by being written to and read from the embedded packet memory using the internal data bus in response to the message sent over the message bus.
In further aspects of the invention the message bus is further for sending an acknowledgement message from the port controller for the output port to the port controller for the input port when a packet has been read from the embedded packet memory. The port controller for the input port releases a memory space occupied by the packet in response to the acknowledgement message. Thus memory space in the embedded packet memory is released when the acknowledgement message is sent over the message bus.
In further aspects the port controller for the input port writes an entire packet to a row of the embedded packet memory without interruption by other port controllers. Page-mode accesses to a same row in the embedded packet memory are faster than page-miss accesses to a different row in the embedded packet memory. Thus the packet is written to the row in the embedded packet memory using mostly faster page-mode accesses.
In further aspects the internal data bus is a wide bus having at least 256 data bits. The internal data bus transfers at least 32 bytes of data for each memory-access cycle. Thus the internal data bus is a wide interface between the input ports and the embedded packet memory.