1. Field of the Invention
The present invention is related to the field of symmetrical multiprocessing systems and, more particularly, to a symmetrical multiprocessing system including a hierarchical architecture.
2. Description of the Related Art
Multiprocessing computer systems include two or more processors which may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole. Generally speaking, a processor is a device configured to perform an operation upon one or more operands to produce a result. The operation is performed in response to an instruction executed by the processor.
A popular architecture in commercial multiprocessing computer systems is the symmetric multiprocessor (SMP) architecture. Typically, an SMP computer system comprises multiple processors connected through a cache hierarchy to a shared bus. Additionally connected to the bus is a memory, which is shared among the processors in the system. Access to any particular memory location within the memory occurs in a similar amount of time as access to any other particular memory location. Since each location in the memory may be accessed in a uniform manner, this structure is often referred to as a uniform memory architecture (UMA).
Processors are often configured with internal caches, and one or more caches are typically included in the cache hierarchy between the processors and the shared bus in an SMP computer system. Multiple copies of data residing at a particular main memory address may be stored in these caches. In order to maintain the shared memory model in which a particular address stores exactly one data value at any given time, shared bus computer systems employ cache coherency. Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches which are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory. For shared bus systems, a snoop bus protocol is typically employed. Each coherent transaction performed upon the shared bus is examined (or xe2x80x9csnoopedxe2x80x9d) against data in the caches. If a copy of the affected data is found, the state of the cache line containing the data may be updated in response to the coherent transaction.
Unfortunately, shared bus architectures suffer from several drawbacks which limit their usefulness in multiprocessing computer systems. A bus is capable of a peak bandwidth (e.g. a number of bytes/second which may be transferred across the bus). As additional processors are attached to the bus, the bandwidth required to supply the processors with data and instructions may exceed the peak bus bandwidth. Since some processors are forced to wait for available bus bandwidth, performance of the computer system suffers when the bandwidth requirements of the processors exceeds available bus bandwidth.
Additionally, adding more processors to a shared bus increases the capacitive loading on the bus and may even cause the physical length of the bus to be increased The increased capacitive loading and extended bus length increases the delay in propagating a signal across the bus. Due to the increased propagation delay, transactions may take longer to perform. Therefore, the peak bandwidth of the bus may decrease as more processors are added.
These problems are further magnified by the continued increase in operating frequency and performance of processors. The increased performance enabled by the higher frequencies and more advanced processor microarchitecures results in higher bandwidth requirements than previous processor generations, even for the same number of processors. Therefore, buses which previously provided sufficient bandwidth for a multiprocessing computer system may be insufficient for a similar computer system employing the higher performance processors.
What is desired is a bus structure that supports the bandwidth requirements of a multiprocessor system with many high performance microprocessors and a relatively large physical distance separating the multiprocessors.
The problems outlined above are in large part solved by a hierarchical bus with a plurality of address partitions. Each physical memory location is mapped to multiple addresses. Therefore, each physical memory location can be accessed using a plurality of address aliases. The properties of each address partition are used by the hierarchical bus structure to determine which transaction are transmitted globally and which transactions are transmitted locally. In this manner, the hierarchical bus architecture eliminates global broadcasts of local transactions.
Broadly speaking, the present invention contemplates a multiprocessor architecture including a plurality of processing nodes, a plurality of low level buses, wherein each processing node is coupled to one of said plurality of low level buses, a plurality of repeaters, wherein each repeater is coupled to one of said low level buses, a top level bus and a system memory. The top level bus is connected to a plurality of repeaters and the repeaters control the transfer of data between the low level buses and the top level bus. The system memory includes a plurality of memory locations. Each of the processing nodes is configured to access all of the memory locations. The system memory locations map to a plurality of address partitions, whereby the system memory locations are addressed by a plurality of address aliases. Properties of the address partitions dictate the control of the transfer of data between the low level buses and the top level bus by the repeaters.
The present invention further contemplates a shared memory system including a plurality of memory locations, wherein the memory locations are allocated to one of a plurality of processing nodes. The memory locations are configured to be accessed by the plurality of processing nodes. The system memory locations map to a plurality of address partitions, whereby the system memory locations are addressed by a plurality of address aliases, and properties of the address partitions dictate which of the processing nodes have access to a data request.
The present invention still further contemplates a method for location specific data transfers on a hierarchical bus. The method includes the steps of: assigning a virtual address range to a process running on a node of said hierarchical bus, performing a data request to an address within said virtual address range, translating said virtual address to a physical address, and determining if said physical address is within a portion of memory designated as global or local. If the physical address is within a portion of memory designated as global, the memory is accessing using a global address. If the physical address is within a portion of memory designated local, determining that the physical address is within local memory. If the physical address is within local memory, accessing said physical address using a local address partition. If the physical address is not within local memory, trapping to the operating system. The operating system may rectify the trap in numerous manners. For example, data may be moved from the physical address to the local memory. Alternatively, the physical address may be remapped as a global address.