1. Field of the Invention
This invention relates generally to a shared main memory system for use within a large-scale multiprocessor system; and, more specifically, to a high performing, multi-port shared main memory system that includes an expandable number of memory sub-units, wherein all sub-units may be participating in memory operations substantially simultaneously, the main memory further includes an expandable number of dedicated point-to-point interconnections for connecting selected ones of the sub-units each to a different one of the memory ports for transferring data in parallel between the selected sub-units and the memory ports, thereby providing a memory system that is capable of supporting the bandpass requirements of a modem high-speed Symmetrical MultiProcessor (SMP) system, and is further capable of expanding as those requirements increase.
2. Description of the Prior Art
Many data processing systems couple multiple processors through a shared memory. The processors may then communicate through the shared memory, and may also be allowed to process tasks in parallel to increase system throughput.
Coupling multiple processors to a single memory system presents several challenges for system designers. The memory system must have an increased bandpass to service the requests from multiple processors in a timely manner. Moreover, since many medium and large-scale multiprocessor systems are modular, and allow for the addition of processors to accommodate an increase in user demands, it is desirable to provide a memory system that is also capable of expanding to provide an increased memory capacity, and/or to include the capability to receive requests from additional processors. Finally, because many multiprocessor systems include cache memories coupled to one or more of the processors within the system so that multiple copies of the same data may be resident within multiple memories in the system at once, a memory coherency protocol is necessary. A memory coherency protocol ensures that every processor always operates on the latest copy of the data. For example, memory coherency guarantees that a processor requesting a data item from main memory will receive the most updated copy of the data, even if the most recent copy only resides in another processor""s local cache.
Often, a memory design satisfies one of these design considerations at the expense of the others. For example, one way to achieve an expandable system is to interconnect one or more processors and their associated caches via a bused structure to a shared main memory. Increased processing capability and expanded memory capacity may be achieved by adding processors, and memory units, respectively, to the bus. Such a bused architecture also makes implementation of a coherency scheme relatively simple. In a bused system, each processor on the bus can monitor, or xe2x80x9csnoopxe2x80x9d, the bus, to determined if any of the operations of the other processors on the bus are affecting the state of data held locally within their respective cache. However, bused systems of this type do not achieve parallelism. Only one processor may use the bus at a given time to access a given memory module, and thus memory will perform only one operation at once. Moreover, the arbitration required to determined bus usage imposes additional overhead. As a result, memory latency increases as more processors are added to the system. Thus, a single-bus architecture is not a good choice in systems having more than a few processors.
Memory latency may be somewhat reduced by using a multi-port main memory system which interfaces to the processors and their local caches via multiple buses. This allows the memory to receive multiple requests in parallel. Moreover, some multi-port memories are capable of processing ones of these multiple requests in parallel. This provides increased parallelism, but latency is still a problem if the system is expanded so that more than several processors are resident on the same bus. Additionally, this scheme complicates the coherency situation because processors may no longer snoop a single bus to ensure that they have the most recent data within their local caches. Instead, another coherency protocol must be utilized. To ensure memory coherency in a multi-bus system, caches may be required to send invalidation requests to all other caches following a modification to a cached data item. Invalidation requests alert the caches receiving these requests to the fact that the most recent copy of the data item resides in another local cache. Although this method maintains coherency, the overhead imposed by sending invalidation requests becomes prohibitive as the number of processors in the system increases.
Another approach to balancing the competing interests associated with providing an improved memory system for a parallel processing environment involves the use of a crossbar system. A crossbar system acts as a switching network which selectively interconnects each processor and its local cache to a main memory via a dedicated, point-to-point interface. This removes the problems associated with bus utilization, and provides a much high memory bandpass. However, generally, crossbar systems may not be readily expanded. A single crossbar switching network has a predetermined number of switched cross points placed at intersections between the processors and memory modules. These switched cross points may accommodate a predetermined maximum number of processors and memory modules. Once each of the switched cross points is utilized, the system may not be further expanded. Moreover, such a distributed system poses an increased challenge for maintaining memory coherency. Although an invalidation approach similar to the one described above may be utilized, the routing of these requests over each of the point-to-point interfaces to each of the local caches associated with the processors increases system overhead.
Thus, what is needed is an expandable main memory system capable of supporting a parallel processing environment. The memory system must be capable of receiving, in parallel, and processing, in parallel, a multiple number of requests. The memory system must further be capable of maintaining coherency between all intercoupled cache memories in the system.
The primary object of the invention is to provide an improved shared memory system for a multiprocessor data processing system;
A further object of the invention is to provide a shared memory system having a predetermined address range that can be divided into address sub-ranges, wherein a read or a write operation may be performed to all of the address sub-ranges substantially simultaneously;
A still further object of the invention is to provide a memory system having multiple ports, and wherein requests for memory access may be received on each of the multiple ports in parallel;
Another object of the invention is to provide a shared memory system having multiple memory ports, and a predetermined address range divided into address sub-ranges, wherein a data transfer operation may be occurring in parallel between each different one of the memory ports and each different one of the address sub-ranges;
A further object of the invention is to provide a shared memory system having multiple memory sub-units each of which maps to an address sub-range, and each of which may be performing a memory operation in parallel with all other sub-units, and wherein queued memory requests are scheduled for processing based on the availability of the memory sub-units;
A further object of the invention is to provide a memory system having a predetermined address range that can be divided into address sub-ranges each mapped to a different memory sub-unit, and wherein additional memory sub-units may be added to the system as memory requirements increase;
A yet further object of the invention is to provide an expandable memory system having a selectable number of memory sub-units each for providing a portion of the storage capacity of the memory system and wherein each of the memory sub-units is expandable to include a selectable number of memory expansion units, wherein the storage capacity of each of the memory expansion units is selectable,
Another object of the invention is to provide a memory system having sub-units each mapped to a predetermined range of memory addresses, and wherein data may be read from, or written to, each of the sub-units simultaneously;
Yet another object of the invention is to provide a memory system having sub-units each mapped to a predetermined range of memory addresses, and wherein each of the sub-units has a selectable number of memory expansion units, and wherein each of the memory expansion units within each of the sub-units may be performing memory operations in parallel;
Another object of the invention is to provide a memory system having sub-units each for performing multiple memory operations substantially simultaneously, and wherein each of the sub-units has a common bus structure capable of supporting each of the simultaneously occurring operations by interleaving address and data signals;
Another object of the invention is to provide a main memory system capable of storing and maintaining directory state information for use in implementing a directory-based coherency protocol;
A yet further object of the invention is to provide a multi-port main memory system capable of routing data between a first unit coupled to a first one of the memory ports, and a second unit coupled to a second one of the memory ports;
Another object of the invention is to provide a multi-port main memory system capable of routing data between multiple first ones of the ports and multiple second ones of the ports in parallel; and
A still further object of the invention is to provide a memory system for use in performing multiple memory read and write operations in parallel, and wherein each memory read and write operation includes the transfer of a block of data signals.
The objectives of the present invention are achieved in a modular multi-port main memory system that is capable of performing multiple memory operations simultaneously. The main memory system includes an expandable number of memory sub-units wherein each of the sub-units is mapped to a portion of the total address space of the main memory system, and may be accessed simultaneously. Multiple point-to-point interconnections are provided within the main memory system to allow each one of the multiple memory ports to be interconnected simultaneously to a different one of the memory sub-units. The capacity of the main memory system may be incrementally expanded by adding sub-units, additional point-to-point interconnections, and additional memory ports. This allows memory bandpass to increase as the processing power of a system grows.
The basic building block of the modular main memory system is the Memory Storage Unit (MSU). The main memory system of the preferred embodiment may be expanded to include up to four MSUs. Each MSU includes multiple memory ports, and an expandable number of memory sub-units called Memory Clusters. The MSU of the preferred embodiment includes four memory ports, and up to four Memory Clusters. Each of the Memory Clusters includes an expandable number of memory sub-units called MSU Expansion Units, wherein each of MSU Expansion Units is adaptable to receive a user-selectable amount of memory. Each of the Memory Clusters of the preferred embodiment includes between one and four MSU Expansion Units, and each MSU Expansion Unit may include between 128 and 512 Megabytes of storage. Thus the main memory system of the current invention includes a minimum of one MSU Expansion Unit having 128 Megabytes, and is incrementally expandable as dictated by user requirements to sixty-four MSU Expansion Units with a total capacity of 32 Gigabytes. This expansion capability provides a system that is highly flexibly, and may be easily adapted to changing processing requirements.
In operation, an MSU receives a read or a write request from a unit coupled to one of the four memory ports. The request is accepted by an MSU if an associated request address maps to the address range associated with one of the Memory Clusters included in that MSU. The request address and any associated data may be queued, and is eventually routed via a point-to-point switching network to the correct MSU Expansion Unit within the correct Memory Cluster. In the case of a memory write operation, the queued data is written to memory and the operation is considered completed. In the case of a memory read operation, data is returned from the MSU Expansion Unit, may be queued, and is eventually returned to the correct memory port via the point-to-point switching network.
Each MSU is designed to perform multiple data transfer operations in parallel. Each MSU is capable of receiving data signals from, or providing data signals to, each of the four memory ports in parallel. While the MSU is performing the memory port transfer operations, unrelated data transfer operations may be in progress simultaneously to all of the Memory Clusters. Thus, a fully populated MSU may be performing up to eight unrelated data transfer operations simultaneously. Furthermore, within each MSU, each of the four MSU Expansion Units within each of the four Memory Clusters may be performing memory operations in parallel so that sixteen unrelated memory operations are occurring simultaneously. A fully populated main memory system including four MSUs has four times this capacity.
Besides providing a memory system capable of highly parallel operations, the bandpass is increased by providing interfaces capable of performing high-speed block transfer operations. Within the preferred embodiment, data is transferred in sixty-four byte blocks called cache lines. Each of the four memory ports, and each of the four MSU Expansion interfaces transfers data in parallel at the rate of 1.6 gigabytes/second. Therefore, within a single MSU, 12.8 gigabytes/second may be in transit at any given instant in time. A fully expanded main memory system containing four MSUs may transfer 51.2 gigabytes/second.
The main memory system solves the memory coherency problem by providing additional storage for supporting a directory-base coherency protocol. That is, a storage array within each of the MSU Expansion Units stores directory state information that indicates whether any cache line has been copied to, and/or updated within, a cache memory coupled to the main memory system. This directory state information, which is updated during any memory operation, is used to ensure memory operations are always performed on the most recent copy of the data. For example, when an MSU receives a request for particular cache line, and the directory state information indicates an updated copy of the cache line resides within one of the cache memories, the MSU causes the updated cache line to be returned to the MSU. The updated data is then routed to the requesting processor via a high-speed point-to-point interconnect within the MSU, and is further stored within the correct MSU Expansion Unit. Such xe2x80x9cReturnxe2x80x9d operations, as they are called, may be initiated to all ports within an MSU substantially simultaneously.
The modular main memory system described herein therefore solves the problems associated with shared main memories of prior art multi-processor systems. The modular memory is extremely flexible, and may be incrementally expanded to accommodate a wide range of user requirements. The system may therefore by tailored to exact user specifications without adding the expense of unnecessary hardware. Additionally, the multi-port structure, independently operational MSU Expansion Units, and the multiple, expandable, point-to-point interconnections provide a highly parallel structure capable of meeting the bandpass requirements of a high-speed processing system. Finally, the directory-based coherency system, which is incorporated within each of the MSU Expansion Units, provides a coherency mechanism that is likewise flexible, and may expand as processing demands increases.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description of the preferred embodiment and the drawings, wherein only the preferred embodiment of the invention is shown, simply by way of illustration of the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various respects, all without departing from the invention. Accordingly, the drawings and description are to be regarded to the extent of applicable law as illustrative in nature and not as restrictive.