A heterogeneous distributed computing system, such as a large modeling and simulation (M&S) system, may include multiple types of devices. For example, an M&S system may comprise network-enabled sensors, server computing devices, end-user devices, firewall devices, intrusion detection systems, and so on. Furthermore, due to accelerating computing demands of M&S systems, there is growing interest in using specialized hardware solutions in M&S tasks. Field programmable gate arrays (FPGAs), graphical processing units (GPUs), distributed computing, real-time processing, and hardware-in-the-loop tasks have resulted in faster and more accurate simulations. The devices of a heterogeneous distributed computing system may be distributed geographically.
The potentially large numbers and diverse types of devices in a heterogeneous distributed computing system may be necessary to allowing the heterogeneous distributed computing system perform a desired role. However, the numbers and types of devices in a heterogeneous distributed computing system may increase the difficulty administrators face in preventing, diagnosing, and correcting faults, errors, misconfigurations, security breaches, downtime, and other adverse events that may befall the heterogeneous distributed computing system. For instance, a wide heterogeneous array of hardware components with varying degrees of connectivity may prevent use of many conventional approaches to health and status monitoring of M&S systems.
Managing and monitoring M&S systems may be further complicated by the security requirements of M&S systems. In some instances, data integrity must be maintained for status information in transit and in storage. Furthermore, in some instances, security credentials for gathering status information remotely must be secure and private. Additionally, in some instances, connecting and querying a collection of M&S nodes may be difficult without many layers of abstraction and treating individual nodes as isolated entities may increase management overhead. In some instances, security policies may even restrict certain avenues for status information collection, prohibiting remote shell access or supporting components that are outside of administrative control.