Distributed computing systems are increasingly being utilized to support high performance computing applications. Typically, distributed computing systems are constructed from a collection of computing nodes that combine to provide a set of processing services to implement the high performance computing applications. Each of the computing nodes in the distributed computing system is typically a separate, independent computing system interconnected with each of the other computing nodes via a communications medium, e.g., a network.
Conventional distributed computing systems often encounter difficulties in scaling computing performance as the number of computing nodes increases. Scaling difficulties are often related to inter-device communication mechanisms, such as input/output (I/O) and operating system (OS) mechanism, used by the computing nodes as they perform various computational functions required within distributed computing systems. Scaling difficulties may also be related to the complexity of developing and deploying application programs within distributed computing systems.
Existing distributed computing systems containing interconnected computing nodes often require custom development of operating system services and related processing functions. Custom development of operating system services and functions increases the cost and complexity of developing distributed systems. In addition, custom development of operating system services and functions increases the cost and complexity of development of application programs used within distributed systems.
Moreover, conventional distributed computing systems often utilize a centralized mechanism for managing system state information. For example, a centralized management node may handle allocation of process and file system name space. This centralized management scheme often further limits the ability of the system to achieve significant scaling in terms of computing performance.