The present invention relates generally to operating systems and highly available fault tolerant systems. More particularly, this invention relates to application frameworks deploying applications with dataflow graphs on networks containing circuit switches where such switches are controlled by the application framework.
Evolving data workloads, such as stream processing workflows, require new types of interconnects because traditional packet switching is expensive in terms of energy use and is not generally flexible enough to dynamically adapt routing to the data requirements. Electrical packet switches at the center of compute clusters are hard to scale at high data rates, and high port counts require large numbers of parallel switch chips and multi-stage architectures. The need for lower-cost and more flexible interconnects is particularly important in High Performance Computing (HPC) systems. Electrical Circuit Switches (ECS) and Optical Circuit Switches (OCS) have been generally used in reconfigurable networks by the telecommunication industry for statically configured network routes. Both ECS and OCS may provide better power efficiency and allow lower latency through the switch than packet switches. The OCS switch is able to set up a dedicated connection dynamically between any pair of input and output ports. Each port is exclusively used in one connection, and, once configured, the network packets do not have to be inspected by the switch as they pass through. Different connections schemes between input/output pairs creates a unique network topology for the nodes that connect to each switch port, where nodes are typically the network endpoints, normally doing computational tasks on data in the workloads.
Stream processing applications are typically long running. In distributed streaming systems, such as that described in a paper entitled, Adaptive Control of Extreme-scale Stream Processing System, L. Amini, N. Jain, A. Sehgal, J Silber, and O. Verscheure (November 2005), streaming applications are deployed in the system and stream connections are established at runtime. The processing elements (PE) in the applications may, depending on their purpose, become CPU intensive, bandwidth intensive, and/or memory intensive. The streaming system determines the placement of these PEs based on available resources and the requirements of the PEs. The resource availability and requirements, however, can change over time. The node on which a PE is placed may not be suitable or optimal for the PE after a time. The system reacts by moving the PE to another node based on the current resource state. This may not be sufficient in some cases, including when a PE cannot be moved for some reason, or when moving the PE has more associated costs than the system is able to accept.
Circuit switched networks not only provide higher bandwidth between nodes but also the flexibility to reconfigure network resources instead of moving PEs. Traditionally, circuit switching technology has been used in wide area networks to provide a dedicated bandwidth connection between a source and destination.
Optical circuit switch (OCS) networks may offer a more scalable alternative to cluster interconnection networks. Benefits of OCS include transparency to data rates and protocols, low power consumption, compatibility with wavelength division multiplexing, and the fact that no need for optical-to-electrical and electrical-to-optical conversions are required.
The optical switch connects the optical fiber ports on the electronic packet switch of each cluster. Switching fibers connecting to different cluster switch ports form a unique inter-cluster all optical network. Optical switches may also be connected directly onto a node in a cluster for finer granularity.
Recent HPC literature has described network topologies that are amenable to evolving data workloads, such as stream processing (see, for example, a paper entitled, On the Feasibility of Optical Circuit Switching for High Performance Computing Systems, Barker, K. J.; Benner, A.; Hoare, R.; Hoisie, A.; Jones, A. K.; Kerbyson, D. K.; Li, D.; Melhem, R.; Rajamony, R.; Schenfeld, E.; Shao, S.; Stunkel, C.; Walker, P. Supercomputing, 2005). Proceedings of the ACM/IEEE SC 2005 Conference describe a network topology that combines the flexibility and other advantages of Optical Circuit Switches described above, with an aggregation scheme whereby several nodes are connected to one of several Electrical Packet Switches, and each Electrical Packet Switch is connected to the Optical Circuit Switch. The Optical Circuit Switch can be dynamically reconfigured as bandwidth needs evolve to connect one or more ports of each Electrical Packet Switch to one or more ports on other Electrical Packet Switches. In this way, data can be sent from a node, aggregated with traffic from other nodes at their packet switch, routed through the Optical Circuit Switch (which is dynamically configured to connect packets switches together needing the highest bandwidth), routed through the target Electrical Packet Switch, and then demultiplexed to target nodes. However, the technique described in the above-referenced paper for routing through the reconfigurable network to maximize network utilization is dependent on the development of new communication protocols. There is no method for existing network protocols, such as TCP/IP.
U.S. Pat. No. 6,792,174B1, entitled “METHOD AND APPARATUS FOR SIGNALING BETWEEN AN OPTICAL CROSS-CONNECT SWITCH AND ATTACHED NETWORK EQUIPMENT,” describes a method for using and controlling Optical Switches in a network. Another patent application, entitled “DUAL NETWORK TYPES SOLUTION FOR COMPUTER INTERCONNECTS,” U.S. Patent Publication No. 2008/0025288, filed on Jul. 27, 2006, describes the use of different network types including circuit switches and packet switches together in a system connecting a set of clusters and nodes. To date, however, there is no known solution for controlling circuit switches in a network, by application frameworks (that are running in the network) and utilizing them for running applications.
U.S. Pat. No. 6,671,254B1, entitled “COMMUNICATION NETWORK AND COMMUNICATION NODE USED IN SUCH NETWORK,” and which issued on Dec. 30, 2003, describes a network comprising nodes with an optical cross connect and a packet switch on each node, together with network management based on traffic monitoring. However, there is no application framework for network management, which is specifically based on hardware packet flow monitoring executed on each node.