In the past, computers were primarily applied to processing rather mundane, repetitive numerical and/or textual tasks involving number-crunching, spread sheeting, and word processing. These simple tasks merely entailed entering data from a keyboard, processing the data according to some computer program, and then displaying the resulting text or numbers on a computer monitor and perhaps later storing these results in a magnetic disk drive. However, today's computer systems are much more advanced, versatile, and sophisticated. Especially since the advent of digital media applications and the Internet, computers are now commonly called upon to accept and process data from a wide variety of different formats ranging from audio to video and even realistic computer-generated three-dimensional graphic images. A partial list of applications involving these digital media applications include the generation of special effects for movies, computer animation, real-time simulations, video teleconferencing, Internet-related applications, computer games, telecommuting, virtual reality, high-speed databases, real-time interactive simulations, medical diagnostic imaging, etc.
The reason behind the proliferation of digital media applications is due to the fact that much more information can be conveyed and readily comprehended with pictures and sounds rather than with text or numbers. Video, audio, and three-dimensional graphics render a computer system more user friendly, dynamic, and realistic. However, the added degree of complexity for the design of new generations of computer systems necessary for processing these digital media applications is tremendous. The ability of handling digitized audio, video. and graphics requires that vast amounts of data be processed at extremely fast speeds. An incredible amount of data must be processed every second in order to produce smooth, fluid, and realistic full-motion displays on a computer screen. Additional speed and processing power is needed in order to provide the computer system with high-fidelity stereo sound and real-time, and interactive capabilities. Otherwise, if the computer system is too slow to handle the requisite amount of data, its rendered images would tend to be small, grainy and otherwise blurry. Furthermore, movement in these images would likely be jerky and disjointed because its update rate is too slow. Sometimes, entire video frames might be dropped. Hence, speed is of the essence in designing modern, state-of-the-art computer systems.
One of the major bottlenecks in designing fast, high-performance computer systems pertains to the current bus architecture. A "bus" is comprised of a set of wires that is used to electrically interconnect the various semiconductor chips and input/output devices of the computer system. Electric signals are conducted over the bus so that the various components can communicate with each other.
FIG. 1 shows a typical prior art bus architecture 100. Virtually all of today's computer systems use this same type of busing scheme. A single bus 101 is used to electrically interconnect the central processing unit (CPU) 103 with the memory (e.g., RAM) 107 via controller 102. Furthermore, other various devices 104-106 are also coupled to bus 101. Bus 101 is comprised of a set of physical wires which are used to convey digital data, address information for specifying the destination of the data, control signals, and timing/clock signals. For instance, CPU 103 may generate a request to retrieve certain data stored in memory 102. This read request is then sent over bus 101 to memory controller 102. Upon receipt of this read request, memory controller 102 fetches the desired data from memory 107 and sends it back over bus 101 to the CPU 103. Once the CPU is finished processing the data, it can be sent via bus 101 for output by one of the devices 104-106 (e.g., fax, modem, network controller, storage device, audio/video driver, and the like).
The major drawback to this prior art bus architecture is the fact that it is a "shared" arrangement. All of the components 102-106 share the same bus 101. They all rely on a single bus to meet their individual communication needs. However, bus 101 can only establish communications between two of these devices 102-106 at any given time. Hence, if bus 101 is currently busy transmitting signals between two of the devices (e.g., device 105 and device 106), then all the other devices (e.g., memory 102, device 104, and CPU 103) must wait their turn until that transaction is complete and bus 101 again becomes available. If a conflict arises, an arbitration circuit, usually residing in memory controller 102, resolves which of the devices 102-106 gets priority of access to bus 101. Essentially, bus 101 is analogous to a telephone "party" line, whereby only one conversation can take place amongst a host of different handsets serviced by the party line. If the party line is currently busy, one must wait until the prior parties hang up, before one can initiate their own call.
Thus, CPU 103 competes for bus 101 bandwidth to access program instructions and data stored in memory 102. Each of devices 104, 105 and 106 need to compete for bus bandwidth to perform input output. Regardless of the speed of CPU 103, the limiting factor of the speed of computer system 100 is very often the bandwidth of bus 101, more particularly, the contention for bandwidth to access memory 107. Historically, bus traffic proceeded to and from memory 102 due to the fact that most input output was performed either to or from memory 102. This input output competes with all other traffic (e.g., CPU 103) for access to memory 102.
In the past, this type of bus architecture offered a simple, efficient, and cost-effective method of transmitting data. For a time, it was also sufficient to handle the trickle of data flowing between the various devices residing within the computer system. However, as the demand for increased amounts of data skyrocketed, designers had to find ways to improve the speed at which bits of data can be conveyed (i.e., increased "bandwidth") over the bus.
One solution to the bandwidth problem was to increase the width of the bus by adding more wires. The effect is analogous to replacing a two-lane road with a ten-lane super freeway. However, the increase in bus width consumes valuable space on an already densely packed and overcrowded printed circuit board. Furthermore, each of the semiconductor chips connected to the bus must have an equivalent amount of pins to match the increased bus width for accepting and outputting its signals. These additional pins significantly increase the size of the chips. It becomes more difficult to fit these chips onto the printed circuit boards. Additionally, the practical limitation for cost effective chips and packages impose a physical restriction on the chip's overall size and its number of pins. Today's buses are typically limited to being 64-bits wide. In other words, 64 bits of data or address can be sent simultaneously in parallel over 64 separate wires. The next step of increasing the bus width to 128 bits wide has become impractical due to this added complexity.
Another solution to the bandwidth problem was to increase the rate (i.e., frequency) at which data is sent over the bus. However, the physics associated with implementing long sets of parallel wires with multiple loads produces a wide range of problems such as impedance mismatches, reflections, crosstalk, noise, non-linearities, attenuation, distortions, timing, etc. These problems become more severe as the bus frequency increases. Higher bus frequencies cannot be attained without fine tuning, extremely tight tolerances, exotic micro-strip layouts, and extensive testing. It is extremely difficult to reliably mass produce such high frequency computers.
As such, applications written for these systems are structured to function around the bandwidth limitations of the system bus. The nature of data the applications transfer via the system bus is accordingly dictated by the bandwidth constraints of the system bus. As a result, there are very few full motion 3D simulation applications written for desktop systems. In the 3D applications that do exist, the realism and richness of 3D applications are greatly simplified in order to reliably and responsively run without slowing the computer system to a crawl. Tomorrow's applications will be rich 3d simulations. They will include extensive video manipulation by the computer system's processor. Multiple video streams, digital synthesis, digital audio are a few of the many applications envisioned. Given a 64-bit bus running at 66 MHz, the highest attainable data rate for a typical computer system is 524 Mbytes per second. Although this data rate appears adequate, it is rapidly becoming insufficient in light of the demands imposed by tomorrow's new applications.
Thus, what is required is a method and system which effectively provides for greatly increased system bus bandwidth. What is required is a method and system which accommodates the enormous bandwidth requirements of digital video, digital audio, 3d graphics, real-time compression and decompression, and the like. What is further desired is a method of servicing the bandwidth requirements of the above applications while conserving memory bandwidth. The required system should provide for a new programming paradigm wherein application designers are not limited by system bus bandwidth constraints. The required system should also allow one set of code to execute on both a new high bandwidth computer system and on conventional computer systems. The method and system of the present invention provides a novel solution to the above requirements.