A bus architecture of a computer system conveys much of the information and signals involved in the computer system's operation. In a typical computer system, one or more busses are used to connect a central processing unit (CPU) to a random access memory and to input/output elements so that data and control signals can be readily transmitted between these different components. When the CPU executes its programming, it is imperative that data and information flow as fast as possible in order to make the computer system as responsive as possible to the user.
In the past, computer systems were primarily applied to processing rather mundane, repetitive numerical and/or textual tasks involving number-crunching, spread sheeting, and word processing. These simple tasks merely entailed entering data from a keyboard, processing the data according to some computer program, and then displaying the resulting text or numbers on a computer monitor and perhaps later storing these results in a magnetic disk drive. With the advent of digital media applications, for example, computer systems are now commonly required to accept and process large amounts of data in a wide variety of different formats. These formats range from audio and full motion video to highly realistic computer-generated three-dimensional graphic images.
A computer system's suitability for the above applications often depends upon the functionality of the computer system's peripheral devices. For example, the speed and responsiveness of the computer system's graphics adapter is a major factor in a computer system's usefulness as an entertainment device. Or, for example, the speed with which video files can be retrieved from a hard disk drive and played by the graphics adapter determines the computer system's usefulness as a training aid. Hence, in addition to the speed of the computer system's CPU, the rate at which data can be transferred among the various peripheral devices often determines whether the computer system is suited for a particular purpose. The electronics industry has, over time, developed several types of bus architectures. Recently, the PCI (peripheral component interconnect) bus architecture has become one of the most widely used, widely supported bus architectures in the industry. The PCI bus was developed to provide a high speed, low latency bus architecture from which a large variety of systems could be developed.
Prior Art FIG. 1 shows a typical PCI bus architecture 100. PCI bus architecture 100 is comprised of a CPU 102 and a main memory 104, coupled to a host PCI bridge containing arbiter 106 (hereafter arbiter 106) through a CPU local bus 108 and memory bus 110, respectively. A PCI bus 112 is coupled to each of PCI agents 114, 116, 118, 120, 122, 124 respectively, and is coupled to arbiter 106.
Referring still to Prior Art FIG. 1, each of PCI agents 114, 116, 118, 120, 122, 124 (hereafter, PCI agents 114-124) residing on PCI bus 112 uses PCI bus 112 to transmit and receive data. PCI bus 112 is comprised of functional signal lines, for example, interface control lines, address/data lines, error signal lines, and the like. Each of PCI agents 114-124 is coupled to the functional signal lines comprising PCI bus 112. When one of PCI agents 114-124 requires the use of PCI bus 112 to transmit data, it requests PCI bus ownership from arbiter 106. The PCI agent requesting ownership is referred to as an "initiator," or bus master. Upon being granted ownership of PCI bus 112 from arbiter 106, the initiator (e.g., PCI agent 116) carries out its respective data transfer.
Each of PCI agents 114-124 may independently request PCI bus ownership. Thus, at any given time, several of PCI agents 114-124 may be requesting PCI bus ownership simultaneously. Where there are simultaneous requests for PCI bus ownership, arbiter 106 arbitrates between requesting PCI agents to determine which requesting PCI agent is granted PCI bus ownership. When one of PCI agents 114-124 is granted PCI bus ownership, it initiates a transaction (e.g., data transfer) with a "target" or slave device (e.g., main memory 104). When the data transaction is complete, the PCI agent relinquishes ownership of the PCI bus, allowing arbiter 106 to reassign PCI bus 112 to another requesting PCI agent.
Thus, in PCI bus architecture 100, as with most other types of bus architectures, only one data transaction can take place on PCI bus 112 at any given time. In order to maximize the efficiency and data transfer bandwidth of PCI bus 112, PCI agents 114-124 and bridge 106 (which functions as an agent on behalf of CPU 102) follow a definitive set of protocols and rules. These protocols are designed to standardize the method of accessing, utilizing, and relinquishing PCI bus 112, so as to maximize its data transfer bandwidth. The PCI bus protocols and specifications are set forth in an industry standard PCI specification (e.g., PCI Specification--Revision 2.1). Although the PCI specification provides for burst data transfer rates of up to 528 Mbytes per second (e.g., a 64 bit PCI bus 112 operating at 66 MHz), the major problem with PCI bus 112, as with most other types of bus architectures, is the fact that it is a "shared" bus.
PCI bus 112 is a shared media bus architecture. All of PCI agents 114-124 share the same bus 112. They all rely on a single bus to meet their individual communication needs. However, as described above, PCI bus 112 can establish communications between only two of PCI agents 114-124 at any given time. Hence, if PCI bus 112 is currently busy transmitting signals between two of the devices (e.g., device 114 and device 124), then all the other devices (e.g., device 122, device 124, etc.) must wait their turn until that transaction is complete, and PCI bus 112 again becomes available. As described above, arbiter 106 allocates ownership of PCI bus 112, resolving which of the PCI agents 114-124 uses PCI bus 112 at any given time.
In this manner, PCI bus 112 is analogous to a telephone "party" line, whereby only one conversation can take place amongst a host of different handsets serviced by the party line. If the party line is currently busy, one must wait until the prior parties hang up, before one can initiate one's own call. Each of devices 114-124 needs to compete for bus bandwidth to perform input-output. Regardless of the speed of CPU 102, the limiting factor of the speed of computer system 100 is very often the bandwidth of PCI bus 112.
In the past, this type of bus architecture offered a simple, efficient, and cost-effective method of transmitting data. For a time, it was also sufficient to handle the amount of data flowing between the various devices residing within the computer system. However, as the demand for low latency data transfer bandwidth has skyrocketed, designers have searched for ways to improve the PCI bus architecture (and other similar bus architectures) by increasing the speed at which data can be conveyed over the bus.
One solution to the bandwidth problem was to increase the width of a bus by adding more wires. The effect is analogous to replacing a two-lane road with a ten-lane super freeway. However, the increase in bus width consumes valuable space on an already densely packed and overcrowded printed circuit board. Furthermore, each of the semiconductor chips connected to the bus must have an equivalent number of pins to match the increased bus width for accepting and outputting its signals. These additional pins significantly increase the size of the chips. It becomes more difficult to fit these chips onto the printed circuit boards. Additionally, the practical limitation for cost-effective chips and packages impose a physical restriction on the chip's overall size and its number of pins. Typical high-end buses are limited to being 64-bits wide. In other words, 64 bits of data or address can be sent simultaneously in parallel over 64 separate wires. The next step of increasing the bus width to 128 bits wide has become impractical due to this added complexity.
Another solution to the bandwidth problem was to increase the rate (e.g., frequency) at which data is sent over the bus. However, the physics associated with implementing long sets of parallel wires with multiple loads produces a wide range of problems, such as impedance mismatches, reflections, cross-talk, noise, non-linearities, attenuation, distortions, timing, etc. These problems become more severe as the bus frequency increases. Higher bus frequencies cannot be attained without fine tuning, extremely tight tolerances, exotic micro-strip layouts, and extensive testing. It is extremely difficult to mass produce such high frequency computers economically and reliably.
Additionally, a new bus architecture requires a re-design of both the physical and logical interfaces for pre-existing peripheral devices for compatibility with the architecture. With well-known, widely-deployed bus architectures such as PCI, there are many hundreds, if not thousands, of "legacy" peripheral devices in the market place. A new bus specification essentially imposes a new interface standard on the manufactures of these devices and forces the redesign of their products. Pre-existing products run the risk of being abandoned, or "orphaned." As such, it is exceedingly difficult for a new bus specification to gain acceptance and support.
Consequently, peripheral devices and applications used with existing bus architectures (e.g., PCI) are structured to function around the bandwidth limitations of the bus. The nature of the data that the applications transfer via the system bus is accordingly dictated by the bandwidth constraints of the bus. For example, given a 64-bit PCI bus running at 66 MHz, the highest attainable data rate for a typical computer system is 528 Mbytes per second. Although this data rate appears adequate, it is rapidly becoming insufficient in light of the demands imposed by tomorrow's new applications.
In addition, when a PCI bus in a computer system is very busy, the coupled peripheral devices have to wait longer to gain ownership of the PCI bus. To spread bus ownership more evenly, the arbiter of the PCI bus will prematurely terminate the various data transactions more often, causing an initiator to break up its data transaction into a series of smaller transactions. This harms data transfer bandwidth even further, since each transaction has a fixed amount of overhead regardless of the amount of data transferred. Consequently, even though the reason for forcing early termination is to insure that all devices have fair access to the PCI bus, the numerous early terminations reduce the effective data transfer bandwidth of the PCI bus even further.
Thus, what is required is a method and system which effectively provides for greatly-increased data transfer bandwidth between peripheral devices of a computer system. What is required is a method and system which accommodates high-bandwidth applications such as digital video, digital audio, 3D graphics, real-time compression and decompression, and the like. What is further desired is a method of servicing the bandwidth requirements of the above applications while retaining compatibility with existing bus standards. The required system should provide greatly-increased data transfer bandwidth to peripheral devices designed to an existing bus standard in a completely transparent manner. The method and system of the present invention provides a novel solution to the above requirements.