This invention involves a scalable, modular approach to input/output management in a computer system. In particular, the approach integrates packet switched router architecture with high bandwidth bridges to low bandwidth peripheral buses. The former provides simultaneous point-to-point communications between multiple devices of the computer system, while the latter minimizes memory latencies for optimal system performance.
In the past, computers were primarily applied to processing rather mundane, repetitive numerical and/or textual tasks involving number-crunching, spread sheeting, and word processing. These simple tasks merely entailed entering data from a keyboard, processing the data according to some computer program, and then displaying the resulting text or numbers on a computer monitor and perhaps later storing these results in a magnetic disk drive. However, today""s computer systems are much more advanced, versatile, and sophisticated. Especially since the advent of digital media applications and the Internet, computers are now commonly called upon to accept and process data from a wide variety of different formats ranging from audio to video and even realistic computer-generated three-dimensional graphic images. A partial list of applications involving these digital media applications include the generation of special effects for movies, computer animation, real-time simulations, video teleconferencing, Internetrelated applications, computer games, telecommuting, virtual reality, high-speed databases, real-time interactive simulations, medical diagnostic imaging, etc.
The reason behind the proliferation of digital media applications is due to the fact that much more information can be conveyed and readily comprehended with pictures and sounds rather than with text or numbers. Video, audio, and three dimensional graphics render a computer system more user friendly, dynamic, and realistic. However, the added degree of complexity for the design of new generations of computer systems necessary for processing these digital media applications is tremendous. The ability of handling digitized audio, video, and graphics requires that vast amounts of data be processed at extremely fast speeds. An incredible amount of data must be processed every second in order to produce smooth, fluid, and realistic full-motion displays on a computer screen. Additional speed and processing power is needed in order to provide the computer system with high-fidelity stereo sound and real-time, and interactive capabilities. Otherwise, if the computer system is too slow to handle the requisite amount of data, its rendered images would tend to be small, grainy and otherwise blurry.
Furthermore, movement in these images would likely be jerky and disjointed because its update rate is too slow. Sometimes, entire video frames might be dropped. Hence, speed is of the essence in designing modern, state-of-the-art computer systems.
One of the major bottlenecks in designing fast, high performance computer systems pertains to the current bus architecture. A xe2x80x9cbusxe2x80x9d is comprised of a set of wires that is used to electrically interconnect the various semiconductor chips and input/output devices of the computer system. Electric signals are conducted over the bus so that the various components can communicate with each other. Virtually all of today""s computer systems use this same type of busing scheme. A single bus is used to electrically interconnect the central processing unit (CPU) with the memory (e.g., RAM) via a memory controller. Furthermore, other various devices are also coupled to the bus. The bus is comprised of a set of physical wires which are used to convey digital data, address information for specifying the destination of the data, control signals, and timing/clock signals. For instance, the CPU may generate a request to retrieve certain data stored in the memory. This read request is then sent over the bus to the memory controller. Upon receipt of this read request, the memory controller fetches the desired data from memory and sends it back over the bus to the CPU. Once the CPU is finished processing the data, it can be sent via the bus for output by a device (e.g., fax, modem, network controller, storage device, audio/video driver, etc.).
The major drawback to this prior art bus architecture is the fact that it is a xe2x80x9csharedxe2x80x9d arrangement. All of the components share the same bus. They all rely on a single bus to meet their individual communication needs.
However, the bus can only establish communications between two of these devices at any given time. Hence, if the bus is currently busy transmitting signals between two of the devices (e.g., the CPU and another device), then all the other devices (e.g., memory) must wait their turn until that transaction is complete and the bus again becomes available. If a conflict arises, an arbitration circuit, usually residing in the memory controller, resolves which of the devices gets priority of access to the bus.
Essentially, the bus is analogous to a telephone xe2x80x9cpartyxe2x80x9d line, whereby only one conversation can take place amongst a host of different handsets serviced by the party line. If the party line is currently busy, one must wait until the prior parties hang up, before one can initiate their own call.
In the past, this type of bus architecture offered a simple, efficient, and cost-effective method of transmitting data. For a time, it was also sufficient to handle the trickle of data flowing between the various devices residing within the computer system. However, as the demand for increased amounts of data skyrocketed, designers had to find ways to improve the speed at which bits of data can be conveyed (i.e., increased xe2x80x9cbandwidthxe2x80x9d) over the bus. One temporary solution was to increase the width of the bus by adding more wires. The effect is analogous to replacing a two-lane road with a ten-lane super freeway. However, the increase in bus width consumes valuable space on an already densely packed and overcrowded printed circuit board. Furthermore, each of the semiconductor chips connected to the bus must have an equivalent amount of pins to match the increased bus width for accepting and outputting its signals. These additional pins significantly increase the size of the chips. It becomes more difficult to fit these chips onto the printed circuit boards. Furthermore, the practical limitation for cost effective chips and packages impose a physical restriction on the chip""s overall size and its number of pins. Today""s buses are typically limited to being 64-bits wide. In other words, 64 bits of data or address can be sent simultaneously in parallel over 64 separate wires. The next step of increasing the bus width to 128 bits wide has become impractical.
Another temporary solution to the bandwidth problem was to increase the rate (i.e., frequency) at which data is sent over the bus. However, the physics associated with implementing long sets of parallel wires with multiple loads produces a wide range of problems such as impedance, mismatches, reflections, crosstalk, noise, non-linearities, attenuations, distortions, timing, etc. These problems become even more severe as the frequency increases. It has come to a point where the highest attainable frequency is approximately 33-50 MHz. Higher frequencies cannot be attained without fine tuning, extremely tight tolerances, exotic micro-strip layouts, and extensive testing. It is extremely difficult to reliably mass produce such high frequency computers.
Given a 64-bit bus running at 50 MHz, the highest attainable data rate for a typical computer system is 400 Mbytes per second. Although this data rate appears to be quite impressive, it is nevertheless fast becoming insufficient to meet the demands imposed by tomorrow""s new applications. Thus, there is a great need for some type of bus scheme that provides increased throughput.
A specific example of a bottleneck in attaining faster, greater bandwidth computer systems is the standard bus architecture found in most personal computers today, the Peripheral Component Interconnect (PCI) bus. This type of bus architecture offers a simple, efficient, and cost-effective method of transmitting data. For a time, it was also sufficient to handle the amount of data flowing between the various devices residing within the computer system. However, as the demand for increased amounts of data skyrocket, the PCI bus is rapidly becoming inadequate to handle the increase in data transmissions.
In light of the shortcomings inherent to the PCI bus architecture, designers have to find ways to improve the speed at which bits of data can be conveyed. For example, one such solution is to implement a switched router as described in International Patent Application Number PCT/US/14321 entitled xe2x80x9cPacket Switched Router Architecture For Providing Multiple Simultaneous Communications,xe2x80x9d published Mar. 26, 1998 and assigned to the assignee of the present invention. Rather than having a shared bus arrangement, a central xe2x80x9cswitchboardxe2x80x9d arrangement is used to select and establish temporary links between multiple devices. Packets of data are then sent over the links. By selecting and establishing multiple links, the central switchboard allows multiple packets to be simultaneously sent to various destinations. This results in significantly greater bandwidth. There exist many different, improved bus architectures to meet the high bandwidth requirements.
However, a common problem with any new bus architecture is that various peripheral devices designed specifically for connection to a PCI bus are now rendered incompatible. Existing PCI devices (e.g., modems, disk drives, network controllers, printers, etc.) are designed specifically for a PCI type bus scheme. As such, they are incompatible with and cannot be connected to any non-PCI based bus design. Of course, the computer industry could establish a new, faster bus standard. However, this is a lengthy, complicated, highly contentious, and extremely expensive process. The entire computer industry would have to make a wholesale switch over to the new bus standard. And until a new bus standard is adopted, computer manufacturers are hobbled by the outdated PCI bus architecture.
An alternative option is to implement a PCI bus in conjunction with a new, faster bus architecture (e.g., a packet switched router architecture). A bridge device is interposed between the two different bus schemes and acts as an interface. This approach works fine, except that an extra delay is incurred when data is routed through the bridge. In particular, the main memory and CPU are coupled to the new bus structure on one side of the bridge to take advantage of its higher bandwidth, whereas the PCI devices are coupled to the PCI bus on the other side of the bridge. Consequently, read/write operations involving PCI devices require that data be routed to/from a PCI device via the PCI bus, through the bridge, to the new bus, and to/from the main memory. These memory accesses through the bridge result in added memory latencies. The extra memory latencies associated with the bridge may exceed the tolerances of some PCI devices. Thus, there is a need for some mechanism to hide or minimize this memory latency so that high speed PCI devices may be serviced.
U.S. Pat. No. 5,915,104, assigned to the assignee of the present invention, provides a novel, effective solution for minimizing latencies in a way that allows standard PCI devices to operate and yet keeps up with higher data rates. It does so by implementing a combination of special write gathering/buffering, read prefetching/buffering, flushing, interrupt, and virtual device operations.
One embodiment of the invention is a bi-modal peripheral controller for a computer system, comprising a single ASIC combination of: at least two high-speed buses; at least one low-speed bus; a packet switched router providing a plurality of communication paths permitting simultaneous multiple communications between any of the high-speed or low-speed buses, a bridge coupled between the routing mechanism and each low-speed bus; a plurality of write buffers, coupled to each bridge, which couple a plurality of write transactions on the low-speed bus coupled to that bridge into a cache line sized transfer to the router; a plurality of read buffers, coupled to each bridge, in which each buffer stores fetched data according to a read request from a device connected to the low-speed bus coupled to that bridge so that the device can access the read buffers multiple times to retrieve the data; a prefetcher, coupled to each bridge, which reads sequential cache lines until the read buffers are full, a page boundary is encountered, or the read buffers are caused to be flushed, if the device connected to the low-speed device coupled to that bridge generates a read request and there is no corresponding data contained in the read buffers; and means for selecting between first and second modes. The ASIC permits transfer of data from a first of the high-speed or low-speed buses to a second of the high-speed or low-speed buses. The first mode of the ASIC permits the router to connect any high-speed bus to any other bus, and the second mode of the ASIC permits the router to connect high-speed buses only to low-speed buses.
In one preferred embodiment, the means for selecting between first and second modes comprises enabling or disabling a clock to the packet switched router, such that when the clock is disabled each bridge coupled to a low-speed bus is directly connected to a high-speed bus. In another preferred embodiment, at least one of the high-speed buses employs differential signaling and at least another of the high speed buses does not.
Another embodiment of the invention is a set of peripheral controllers, each of which comprises at least one bi-modal ASIC. The bi-modal ASIC comprises at least two high-speed buses; at least one low-speed bus; means for routing packet switched data from a first of the high-speed or low-speed buses to a second of the high-speed or low-speed buses; means for compensating for memory latency between the low-speed bus and either high-speed bus; and means for selecting between first and second modes. The first mode of the ASIC permits the routing means to connect any high-speed bus to any other bus, and the second mode of the ASIC permits the routing means to connect high-speed buses only to low-speed buses. In one preferred embodiment, at least one of the high-speed buses employs differential signaling and at least another of the high speed buses does not.
Another embodiment of the invention is a peripheral controller for a computer system comprising a plurality of identical bi-modal ASICs, at least two high-speed buses, at least one low-speed bus, means for routing packet switched data from a first of the high-speed or low-speed buses to a second of the high-speed or low-speed buses, and means for compensating for memory latency between the low-speed bus and either high-speed bus. One ASIC is in a mode that permits the routing means to connect any high-speed bus to any other bus, while all other ASICs are in a mode that permits the routing means to connect high-speed buses only to low-speed buses. In one preferred embodiment, the there are three identical bi-modal ASICs. In another preferred embodiment, at least one of the high-speed buses employs differential signaling and at least another of the high speed buses does not.