The present invention relates generally to coupled supercomputers, and more specifically, to providing reliable communication over cross-coupled links between independently managed compute and storage networks.
Supercomputers, also known as high performance computers, typically include compute resources and storage devices connected to each other through an interconnection network. The network generally includes a set of routers or switches connected to clients through an appropriate network interface on the clients or nodes. A management subsystem of these systems generally has a complete view of all the entities in the system. Typically, the storage devices are shared between multiple systems. This sharing is made possible through server nodes attached to the storage devices that communicate with compute client nodes spread across multiple systems over an independent network. The access to storage devices across multiple systems is typically provisioned using a separate shared storage fabric that is independently managed.