Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Cloud computing, which may refer to both applications delivered as services over the Internet and hardware along with systems software in datacenters that provide those services, has taken center stage in information technology in recent years. With virtualization technology, a component of cloud computing, underlying hardware resources may be shared by multiple virtual machines or domains with each running its own operating system (OS). Sharing hardware resources may give rise to higher hardware utilization and lower power consumption. A virtual machine monitor (VMM) (also sometimes may be referred to as a hypervisor), may be typically responsible for isolating each running instance of an OS from an underlying physical machine. The VMM may translate or emulate special instructions of a guest OS.
Graphical processing unit (GPU)- and field programmable gate array (FPGA)-based hardware accelerators are also gaining popularity in the server industry. Accelerators speed up computationally intensive parts of an application. Successfully and efficiently adding hardware accelerators to virtualized servers may bring cloud clients apparent speed-up for a wide range of applications. GPUs may typically be inexpensive and programmed using high-level languages and application programming interfaces (APIs), which abstract away hardware details. FPGAs may outperform GPUs in many specific applications. Moreover, the ability to perform partial run-time reconfiguration may be a distinguishing feature of FPGAs.
Some FPGA virtualization solutions may tend to stay at a multitasking level on a single OS. Prevailing GPU virtualization approaches may primarily intercept and redirect API calls to the hosted or privileged domain's user space, resulting in reduced efficiency and higher overhead. In addition, in some FPGA virtualization or GPU virtualization solutions, the accelerator may typically service only one request each time.