In recent years, lightweight virtualization technologies such as Docker (available at “www.docker.com”) and LXC (available at “linuxcontainers.org”) are gaining enormous traction not only in the research field, but also in terms of real-world deployment. Google, for instance, is reported to run all of its services in containers, available at “www.theregister.co.uk/2014/05/23/google_containerization_two billion”, and Container as a Service (CaaS) products are available from a number of major players including Azure's Container Service (available at “azure.microsoft.com/en-us/services/container-service”), Amazon's EC2 Container Service and Lambda offerings (available at “aws.amazon.com/lambda”), and Google's Container Engine service (available at “cloud.google.com/container-engine”).
Beyond these services, lightweight virtualization is crucial to a wide range of use cases, including just-in-time instantiation of services (e.g., described in the non-patent literature of MADHAVAPEDDY, A., LEONARD, T., SKJEGSTAD, M., GAZA-GNAIRE, T., SHEETS, D., SCOTT, D., MORTIER, R., CHAUDHRY, A., SINGH, B., LUDLAM, J., CROWCROFT, J., AND LESLIE, I. Jitsu: Just-in-time summoning of unikemels. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15) (Oakland, Calif., 2015), USENIX Association, pp. 559-573)—e.g., filters against Distributed Denial of Service (DDoS) attacks, TCP acceleration proxies, content caches, etc.—and networks functions virtualization (NFV), all the while providing significant cost reduction through consolidation and power minimization.
The reasons for containers to have taken the virtualization market by storm are clear. In contrast to heavyweight, hypervisor-based technologies—which represent virtualization platforms—such as VMWare, KVM or Xen, containers provide extremely fast instantiation times, small per-instance memory footprints, and high density on a single host, among other features.
However, no technology is perfect, and containers are no exception. Security, for one, has been and continues to be a thorn on their side. First, their large trusted computing base (TCB), at least compared to type-1 hypervisors, has resulted in a large number exploits. Second, a container that causes a kernel panic will bring down the entire host. Further, any container that can monopolize or exhaust system resources (e.g., memory, file descriptors, user IDs, forkbombs, etc.) will cause a Denial-of-Service (DOS) attack on all other containers on that host. Over the years, a significant amount of effort has resulted in the introduction of mechanisms such as user namespaces and Seccomp that harden or eliminate a large number of these attack vectors. However, a simple misconfiguration can still lead to an insecure system.
Beyond security, another downside of containers is that their sharing of the same kernel rules out the possibility to specialize the kernel and its network stack to provide better functionality and performance to specific applications. Finally, containers do not currently support live migration, although support for it is under development.
At least for multitenant deployments, this leaves with a difficult choice between:
(1) containers and the security issues surrounding them, and
(2) the burden coming from heavyweight, VM-based platforms.
Clearly, it cannot easily, overnight, be fixed all of the security issues related to containers, nor prevent new ones from arising.
Thus, the ability to quickly boot virtual machines (VMs), destroy them, migrate them, and concurrently run as many of them on a single server is important to a vast number of applications in the field of Network Function Virtualization (NFV). For example, to run as many vCPEs (virtual customer premises equipments) as possible on a single server, to instantiate firewalls on a per-connection basis, to dynamically create filters to deal with Denial-of-Service attacks, to be able to quickly and dynamically boot monitoring services to oversee financial transactions, and to host services whose key performance indicators (KPIs) depend on boot times such as block chain and function-based services such as Amazon's Lambda, among many others.
A significant part of the overhead when booting a virtual machine or migrating it comes from the scalability of the back-end information store, for example the XenStore in the Xen hypervisor, which is used to keep control information about the instances currently running in the system.
Hence, known virtualization platforms use a back-end information store to keep track of control information about the virtual machines currently running on the system such as a unique machine identifier, a name, the amount of memory allocated, etc., along with information about the virtual devices they are using, for example network device addresses and device capabilities. While certainly useful, the back-end information store is often a source of bottlenecks that only get worse as the number of virtual machines increases. The reason for this is that an operation like virtual machine creation requires multiple interactions with such a back-end information store.
The back-end information store is crucial to the way a virtualization platform such as Xen functions, with many xl commands making heavy use of it. As way of illustration FIG. 5a shows the process when creating a virtual machine and its (virtual) network device. First, the toolstack writes an entry to the network back-end's directory, essentially announcing the existence of a new virtual machine in need of a network device. Previous to that, the back-end placed a watch on that directory; the toolstack writing to this directory triggers the back-end to assign an event channel and other information (e.g., grant references, a mechanism for sharing memory between virtual machines and to write it back to the back-end information store such as the XenStore of Xen. Finally, when the virtual machine boots up, it contacts the back-end information store to retrieve the information previously written by the network back-end. As such, the above is a simplification: in actuality, the virtual machine creation process alone can require interaction with over 30 back-end information store entries, a problem that is exacerbated with increasing number of virtual machines and devices. Worse, the back-end information represents a single point of failure.
As previously mentioned, the importance of small creation and boot times is at least partly demonstrated by the rise of containers and their typically faster-than-VMs boot times, although containers trade-off this performance against isolation, which is basic to a number of the application scenarios mentioned above. Known virtualization platforms, and the virtual machines that run on top of them, appears to be inherently and fundamentally heavyweight and slow to boot.