Currently, various entities provide cloud computing services globally to different customers over various sectors for critical and non-critical applications. These entities provide cloud computing services including, for example, Software-as-a-Service (SaaS), Infrastructure-as-a-Service (IaaS), and/or Platform-as-a-Service (PaaS). A cloud computing system typically comprises a large cluster of servers distributed over one or more data centers for purposes of providing data protection, high availability, and high-performance computing, and to otherwise provide sufficient quality of service (QoS) for successful service delivery and meet the obligations of service level agreements (SLAs) with the cloud customers.
Various cloud-based services such as accelerator (“X”)-as-a-Service (XaaS) and graphics processing unit (GPU)-as-a-Service (GPUaaS) allow cloud users and applications to utilize specialized hardware accelerator resources that exist in different servers within one or more data centers. XaaS allows for pooling, sharing, and optimization of a heterogenous computing environment comprising specialized and expensive hardware accelerators including, but not limited to, GPUs, tensor processing units (TPUs), application-specific integrated circuits (ASICs), field programmable gate array (FPGAs), image processing units (IPUs), emerging deep learning accelerators (DLAs), advanced graph processors, artificial intelligence (AI) accelerators, and other specialized hardware accelerator resources that are configured to support high-performance computing (HPC) services provided by cloud computing systems.
The implementation of XaaS or GPUaaS in a distributed computing environment, which comprises a large scale of shared accelerator resources (hardware, virtual, etc.) executing on a cluster of computing nodes, can support various emerging HPC applications such as big data analytics, inference and model training for machine learning and deep learning applications, AI processing, big data analytics, etc. However, implementing an efficient distributed computing environment for these types of HPC applications is not trivial since the intensive computational workloads, and the massive volume of data which must be stored, streamed, prefetched, and coordinated between the shared computing resources of the distributed computing platform, presents a significant challenge and practical limit on system performance and scalability.