The present invention generally relates to communication in computers systems and more particularly to partitioning distributed computer systems to permit communication between remote endnodes in the distributed computer system.
A traditional computer system has an implicit ability to communicate between its own local processors and from the local processors to its own I/O adapters and the devices attached to its I/O adapters. Traditionally, processors communicate with other processors, memory, and other devices via processor-memory buses. I/O adapters communicate via buses attached to processor-memory buses. The processors and I/O adapters on a first computer system are typically not directly accessible to other processors and I/O adapters located on a second computer system.
For reasons stated above and for other reasons presented in greater detail in the description of the preferred embodiments section of the present specification, there is a need for an improved partitioning mechanism in distributed computer systems to permit efficient direct accessibility by processors and I/O adapters on a first computer system to other processors and I/O adapters located on a second computer system.
The present invention provides a distributed computer system having a first subnet including a first group of endnodes and a second subnet including a second group of endnodes. Each endnode in the first and second groups of endnodes includes at least one process which produces and/or consumes message data, and queue pairs. Each queue pair includes a send work queue having work queue elements that describe message data for sending, and a receive work queue having work queue elements that describe where to place incoming message data. A communication fabric is physically coupled the first group of endnodes and the second group of endnodes. A partitioning mechanism associates a first partition key representing endnodes in a first partition to a first group of queue pairs and a second partition key representing endnodes in a second partition to a second group of queue pairs for enabling communication between endnodes over the communication fabric.
In one embodiment, at least one endnode includes queue pairs in different partitions. In one embodiment, at least one endnode includes queue pairs in the same partition.
In one embodiment, a queue pair must have a partition key associated to the queue pair before the queue pair is used.
In one embodiment, the endnodes include ports and the ports each include a partition key table having at least one partition key entry. In one embodiment, each subnet includes a partition manager providing definition and setup of partitions on the subnet. In one embodiment, a partition manager spans a plurality of subnets and employs a partition management agent in each of the plurality of subnets to perform partition management operations, such as writing and invalidating PKeys. In one embodiment, at least one subnet includes multiple partition managers. In one embodiment, the partition manager controls the content of the partition key table.
In one embodiment, the partitioning mechanism inserts a partition key associated to a send work queue of a source endnode in a frame sent from the send work queue, stores a partition key at a destination endnode, and compares the partition key in the frame to the stored partition key at the destination endnode.
If the partition key in the frame matches the stored partition key at the destination endnode, the frame is accepted and processed normally. If the partition key in the frame does not match the stored partition key at the destination endnode the frame is rejected.