Historically, the use of web application code has been split between origin servers and browsers that are connected to one another by a network that transmits data from point to point. Initially, large websites and web applications were first run on large physical mainframe servers that could handle large traffic loads and large data transfers. Over time a switch was made to provide websites and web applications to be deployed on tens to hundreds of commodity servers that allowed for a reduction in cost, more fault tolerance, and increased performance. This technology is referred to as cloud computing. The technology for providing web applications further evolved to utilize virtual machines where one physical machine could be split into multiple virtual machines that can be independently managed. Virtual machines typically have a high overhead cost in terms of compute resources. For instance, each virtual machine is typically allocated hundreds of megabytes of random-access memory (RAM) and typically takes tens of seconds to boot. Virtual containers can be used to provide isolation between customers of the cloud computing platform and are less resource intensive than virtual machines. However, web application code running in a container typically is run in its own operating system (OS)-level process, consuming RAM, and inducing context-switching overhead. While native code can load quickly in a container, many server-oriented language execution environments are not optimized for startup time.
Some cloud computing platforms instantiate a containerized process for customer code and auto-scale the process which creates cold-starts. A cold-start occurs when a new copy of the code starts on a physical machine. When a new containerized process is instantiated, it can take between hundreds of milliseconds to multiple seconds (e.g., between 500 ms to 10 seconds) to complete. This means that any request to be serviced by the code to be executed in a container may be waiting for as much time as it takes to start execution of the new containerized process (e.g., for as much as ten seconds). Also, this containerized process can only process a single request at a time and a new containerized process must be cold-started each time an additional concurrent request is received. This means that each such request to be serviced by a new container can experience significant lag that does not improve over time. If the containerized process does not receive a request to be processed within a certain amount of time, it will automatically terminate, and a new containerized process will need to be cold-started again once a request is received. When new customer code is deployed, this entire process proceeds again as each containerized process needs to be instantiated anew.