The present invention relates generally to multiprocessor systems and methods. More particularly, the invention relates to a shared memory cache coherent multiprocessor system utilizing a point-to-point interconnect architecture.
The aim of parallel processing is to utilize a number of processing elements that can communicate and cooperate to solve a problem. In a highly parallel processing system, hundreds of processing elements are used to solve a problem that is spread over many processing elements. Not all of the processing elements are used to run a single problem and the system can be configured to execute multiple problems simultaneously. By contrast, in a low parallel processing system, tens of processing elements are used to solve an entire problem.
Symmetric multiprocessing (SMP) is one such type of low parallel processing system. A SMP system is characterized by xe2x80x9csymmetricxe2x80x9d processors that each have an equal share and access to the system resources, including memory and I/O. The processors are managed by a single operating system that provides an application program with a single view of the entire system.
FIG. 1 illustrates one such shared memory SMP 100. There is shown a number of symmetric processors 102A-102N interconnected by a bus 104. A main memory 106 is provided that is connected to the bus 104 and shared by each of the processors 102. In addition, I/O devices 108 are connected to the bus 104 and are accessible by each processor 102 and the main memory 106. Each of the components of the system 100 are synchronized to a common system clock 110.
In order to reduce the traffic to the main memory 106, each processor 102 has a local cache memory 112 that can contain shared data. Since the data in the each processor""s cache 112 can be shared by each processor 102, the problem then becomes one of cache coherency. In most SMW systems, a snoopy bus protocol is used to maintain cache coherency. In a snoopy bus protocol, a memory access transaction, such as a read or write, is broadcasted to all the processors 102 connected to the bus 104. Each processor 102 monitors or xe2x80x9csnoopsxe2x80x9d the bus 104 for a memory access transaction that pertains to a cache line that is associated with the processor""s cache 112. When the processor 102 finds such a transaction, it takes appropriate action to ensure that each cache line is coherent within the system 100.
There are several disadvantages with this type of SMW system. The primary disadvantage is the use of the bus as the interconnect structure. Although the use of the bus provides cache coherency, it is a limiting factor for improving the system""s throughput. First, the use of the bus constrains the number of transactions that can be processed simultaneously. The same bus is used to process both memory and I/O transactions initiated by each processor. As such, only one transaction can be processed at a time.
Second, the contention for the bus by each processor to access main memory unnecessarily increases the overhead in servicing a memory access transaction. Various approaches have been tried to overcome this limitation such as increasing the width of the bus, running the bus at a higher clock speed, and increasing the size of the caches. However, each of these approaches greatly increases the expense and complexity of the system.
Another limitation with the use of the bus are the well-known transmission line effects associated with buses. These transmission fine effects are attributable to the complicated electrical phenomenon present in the connections made to each device coupled to the bus. These transmission line effects limit the speed at which the bus operates thereby reducing the system""s throughput.
Accordingly, there exists a need for a SMP system that overcomes these shortcomings.
The present invention pertains to a system and method for operating a shared-memory multiprocessing system with cache coherency. The system includes a number of devices including several processors, a multiple accessible main memory, and several external I/O devices. Each device is connected to a flow control unit FCU). The FCU controls the exchange of data between each device in the system. The FCU includes a snoop path for processing a first set of data transactions and one or more data paths that process a second set of data transactions. The snoop path and each of the data paths can operate concurrently thereby providing the system with the capability of processing multiple data transactions simultaneously thereby increasing the system""s throughput.
Each device is connected to the FCU by means of a dedicated channel or point-to-point connection. The FCU has a dedicated interface unit for each device. A channel is used by one device and its associated interface unit in the FCU. Since the channel is not a bus, it does not experience the well known transmission line effects associated with buses, and as such, can operate at a high transfer rate. The improved speed of the channel increases the system""s throughput.
In addition, the use of the channel does not require an arbitration phase or arbitration logic as is required in bus-based interconnect structures. The elimination of the arbitration logic reduces the complexity of the circuitry and the elimination of the arbitration phase increases the system throughput.
Each of the processors is associated with its own system clock and runs independent of other processor and the FCU. As such, the FCU can receive requests from each of the processors with a high degree of tolerance to clock skewing between the different devices.
In a preferred embodiment, the technology of the present invention can be utilized in a SMP environment. There can be n symmetric processors, n CPU interface units (CIU), l memory control units (MCU), and k bus bridge units (BBU) connected to the FCU. Each processor can have a L2 cache containing data that is shared amongst the processors. Each CIU is coupled to a processor bus and receives memory and I/O requests initiated by the processor to access data that is external to the processor. The CIU translates the processor bus cycles into channel cycles and vice versa.
Each MCU is connected to one or more memory devices and serves to control access between the FCU and the portion of main memory that is under its control. Each BBU serves to provide a communication path between one or more I/O buses interconnected to external I/O devices and the FCU. The BBU receives data requests from the FCU via the channel and from the I/O buses. The BBU converts the I/O bus cycles into channel cycles and vice versa.
The FCU processes memory and I/O transactions received from the devices. The memory transactions can be used to access data that resides in another processor""s cache, to access data stored in main memory, to maintain cache coherency, and to access memory mapped I/O. Main memory can include portions designated as either cacheable memory, non-cacheable memory, and I/O addressable memory. The I/O transactions can be used to transfer data between the processor associated with a CIU and an external I/O device.