1. Field of the Invention
The present invention relates to ports for a shared memory node, and more particularly, to scalable ports for connecting two or more nodes together.
2. Description of the Related Art
Conventional scalable multiprocessors consist of multiple nodes that are connected together using an interconnect system. Each node consists of a processor, dynamic random access memory (DRAM), and an input/output (I/O) device. The processor, DRAM, and I/O device couple with a bus. A single chipset also couples with the bus and controls interactions between all of the components.
The single chipset also couples with a conventional interconnect port. A conventional interconnect port is an external interface that allows for physically connecting nodes together in an interconnect system. Connecting multiple nodes together may allow for creation of a shared memory system. An example of a shared memory system is a Cache Coherent, Non-Uniform Memory Access (CC-NUMA) architecture.
In an interconnect system, to connect one node to another node requires an interconnect port. Where there are only two nodes, the interconnect port is optimized for communication between these two nodes only. A dedicated interconnect port in two node systems is more popular and more commonly in use than systems having three or more nodes. However, because the interconnect port is also dedicated for such two node systems, the interconnect port is not scalable beyond two nodes.
Interconnecting more than two nodes requires adding additional hardware between the interconnect port and each additional node. This additional hardware is used to scale the ports. The additional hardware also increases the overall cost of the system. Further, the additional hardware requires additional system space making it less suitable for limited space environments and applications. The additional hardware also increases the number of signal delay points that, in turn, causes a decrease in overall system performance. The additional hardware and the problems it introduces is another reason why such conventional interconnect ports are not desirable for systems with only two nodes.
In summary, some problems with conventional interconnect systems are that there may be a need for up to three different types of interconnect ports when adding or removing nodes from the interconnect system. If there is only one node, no interconnect port is needed. If there are two nodes, a non-scalable interconnect port is needed. If there are three or more nodes, a scalable interconnect port is needed. However, this scalable interconnect port is inefficient for two node systems.
Therefore, there is a need for an interconnect port that (1) is scalable, (2) has high performance in systems having three or more nodes, as well as two node systems, and (3) does not increase system costs when additional nodes are added to the system.
An interconnect system of the present invention includes nodes that are coupled together and communicate with each other. The interconnect system may include one node, two nodes, and more than two nodes. In the interconnect system with one node, because there is only that node, there is no interconnect. In a two node system, both nodes may be directly connected to each other to form the interconnect system. In a more than two node system, each node does not directly connect to each other node. Rather, each node includes a protocol engine and all of the protocol engines couple together to form the interconnect system. Generally, each node includes a node control unit, a memory system, an input/output (xe2x80x9cI/Oxe2x80x9d) system, and one or more processing units, e.g., central processing units. Each processing unit includes an associated processor cache in which data may be stored.
In both the two node and more than two node case, the nodes couple together through an interconnect port. The interconnect port may be referred to as a scalability or expansion port. The interconnect port includes a physical layer, a signal protocol layer, a command protocol layer, and a coherence protocol layer. The physical layer receives or transmits signals. The signal protocol layer makes use of the physical layer and defines a relationship with either the received or the transmitted signal. The command protocol layer couples to the signal protocol layer and generates either a request for data in response to the received signal or a reply in response to preparing the transmitted signal. The coherence protocol layer makes use of the command protocol layer and provides a set of legal transactions for data in response to either the request for data or the reply.
The physical layer, the signal protocol layer and the command protocol layer are symmetrical layers. The coherence protocol layer is an asymmetrical layer. This advantageous design of the interconnect port allows for universal application of the port to both two node and three or more node interconnect systems. The symmetrical design and structure of the port allows for each node in the interconnect system to be both a master and a slave. For example, in a two node interconnect system, the port allows for direct connection of two nodes. This provides operational efficiencies for the interconnect system so that both nodes can be a master and a slave and accordingly source requests and/or process requests. Further, the symmetrical nature of the port allows for connecting three or more nodes in an interconnect system, without requiring additional system components or resources.
The present invention also includes memory accesses with and without pipelining. More particularly it includes local and remote coherence protocols that permit legal transactions for dual and multi-node systems. In a pipelined environment, the present invention increases overall system speed for data access because there is a latency reduction. For example, the present invention allows for a speculative snoop and a speculative memory access to occur even as a local memory access for data is occurring. Further, when a directory determines that data resides remotely, it does not need to wait for a follow-up to begin access of this data. This increases overall system efficiency and reduces latency.
The present invention also handles crossing cases. In a crossing case, one side (or node) sends a request to the other side (or node) for a particular address and receives a request for this address from the other side before receiving a reply to its request. An advantage of handling crossing cases as in the present invention is that such cases may be resolved without discarding (or killing) subsequent processor requests. Rather, the subsequent request for data is beneficially processed after the request that preceded it. Moreover, in some instances the subsequent request is advantageously processed before the preceding request, for example, when a particular request may not be retried.
The features and advantages described in the specification are not all inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.