In recent years, the performance of processors has been significantly improved. The speed of memory and other peripheral devices, however, does not show such dramatic progress. In a typical process system, the central processor, memory and other devices share a common bus. The central processor usually serves as a bus master with the memory and other devices being the bus slaves. The speed gap between the bus master and bus slaves has presented a big problem that severely limits the performance of the system as a whole.
For a multiprocessor system, the problem becomes even more serious because the system has plural bus masters and they have to share a common bus. In the past few years, several approaches to solving this problem have been proposed and adopted in the industry. One popular approach involves the use of faster cache memories or local memories for individual masters. Another approach improves the data transfer over a common bus by using plural common buses in a system. A processor based on Harvard Architecture is an example that uses the multiple bus approach.
Harvard architecture has two external buses. One is for fetching instructions only and the other for accessing data only. Each bus has an independent physical address space. In a processor having Harvard architecture, instructions and data cannot be located in one physical address space. Therefore, it is more difficult to program than an ordinary von Neumann processor.
Another example having plural buses is a layered bus system often seen in a personal computer or an engineering workstation. At present, this type of system is not built on a single semiconductor chip. Nevertheless, it is a good example for comparing with a system according to the present invention. FIG. 1 illustrates a layered bus system used in a personal computer. As shown in the figure, there is a plurality of bus having different data transfer capability. The data transfer capability of a bus is usually measured by the product of the bus cycle speed and the bit width of the data bus.
An embodiment known in a recent study is a PC/AT-compatible machine comprising an Intel Pentium processor. There are three buses in the system. The specification of the data transfer capability in such a system is generally as follows:
Processor External Bus with 60 MHz or 66 MHz bus cycle and 64 bit data bus bandwidth. PA1 PCI (Peripheral Component Interconnect) Bus with 33 MHz bus cycle and 32 bit data bus bandwidth. PA1 ISA Bus with around 10 MHz bus cycle and 8/16 bit data bus bandwidth.
In this system, every bus slave is connected to one of the buses, according to its own access speed and data transfer rate. Semiconductor memories such as DRAM (Dynamic Random Access Memory) are connected to the Processor External Bus, peripheral devices requiring higher data transfer rate such as video processors are connected to the PCI Bus, and peripheral devices requiring lower data transfer rate such as magnetic memories are connected to the ISA Bus.
In such an architecture, bus masters including Pentium Processor directly access only Processor External Bus. Bus bridge units are required for the bus masters to access PCI Bus and ISA Bus as shown in FIG. 1. The bridge units comprises FIFO (first-in-first-out) memory buffers for the bus master to access data of a lower speed device through ISA Bus and PCI Bus. If a system includes FPU (Floating Point Unit) or another CPU as another bus master, they are connected to Processor External Bus. Bus arbitration among these bus masers is done only for Processor External Bus.
The system has a disadvantage that the bandwidth of the bus in a higher speed layer is wasted when the bus master accesses a bus in a lower speed layer. Another disadvantage is that complex circuit and extra memory such as FIFO memory are necessary in the bus bridge unit.
From the foregoing discussions of prior arts, it can be seen that although a multiprocessor system employing conventional multiple buses can eliminate some of the bandwidth problems between bus masters and bus slaves, the disadvantages as pointed out earlier make the conventional approach inappropriate for a single-chip multiprocessor system. There exists a strong need for having a more efficient bus architecture for a single-chip multiprocessor system.
In a multiprocessor system, a common bus plays an important role. It allows bus masters to share all bus slaves. If the bus slave on the common bus is a memory device, the memory space can be freely distributed to the bus masters. Nevertheless, it also presents a bus access problem. If the bus arbitration among bus masters is not done efficiently, the overall system performance may be greatly degraded. In order to avoid the deterioration of total system performance, the system may have to use extra memories such as local cache memories. Therefore, it is desirable that an efficient bus arbitration mechanism be built in a multiprocessor system. In the following, some conventional bus arbitration systems according to prior arts and their advantages and disadvantages are described.
Traditional methods for bus arbitration systems include Daisy Chain method, Polling method, Concurrent method having a priority encoder/decoder and so on. FIG. 2 shows a bus arbitration system that is implemented by the Daisy Chain method. As we can see, a plurality of bus masters A, B, C, . . . are in chains. In this architecture, each bus master has a right to access a bus after it obtains a bus grant signal from the bus master on its upper layer. Therefore, a bus master asserts a bus grant signal to the bus master on its lower layer while the former has no bus access request or the bus is available. In other words, the bus master on a lower layer can not have a bus access during one or more bus masters on its upper layer have a bus request or the bus is being used.
The Daisy Chain structure can be constructed by a simple circuit, and the priority level to each bus master can be preset according to the volume and important degree of the data to be processed by each bus master. However, as the number of the bus masters in the chain increases, the arbitrator significantly delays arbitration time and the system performance is deteriorated. In addition, the arbitrator can not guarantee to give bus cycles to the bus masters having lower priority. Furthermore, the priority level to each bus master is fixed and cannot be changed.
FIG. 3 illustrates a bus arbitration system that is implemented by the Concurrent method having a priority encoder/decoder. As shown in the figure, all bus masters A, B, C, . . . , N send bus request signals to the priority encoder/decoder. After the bus is released, the priority encoder/decoder asserts the bus grant signal to the highest priority bus master according to a fixed priority.
This priority can be preset according to the volume and important degree of the data to be processed by each bus master. In this arbitration architecture, the arbitration time is fixed regardless of the number of bus masters. However, the arbitrator still can not guarantee to give bus cycles to the bus masters having lower priority. Furthermore, the priority level to each bus master is fixed and cannot be changed.
An arbitrator implemented by the Polling method can guarantee to give bus cycles to the bus masters having lower priority. The bus arbitrator cyclically detects each bus request signal issued by all bus masters. This detection is done for each bus master. Once an asserted bus request signal is detected, the bus arbitrator asserts the bus grant signal to the bus master. Each bus master can have a bus access when it gets the bus grant signal from the bus arbitrator.
The system can guarantee to give bus cycles to all bus masters. Every bus master can get a bus grant during certain period of time. However, a significant disadvantage of this method is similar to that of the Daisy Chain method. That is, as the number of the bus masters increases, the arbitrator significantly delays arbitration time and the system becomes less efficient. Furthermore, each bus master has the same priority level no matter how critical it has to access the bus.
In any of the arbitration systems described above, bus cycles are under the control of the bus master that is granted to access the bus. When a bus is granted to a bus master which has slower bus cycle speed, the slower bus cycle speed becomes the common bus cycle. This is also a key reason why the system performance deteriorates. Therefore, it is also important that a multiprocessor system can have an efficient bus arbitration mechanism that overcomes this drawback.