Historically, web application code has been split between origin servers and browsers that are connected by a network that transmits data from point to point. Many large websites were first run on large physical mainframe servers that could handle large traffic and large data. Over time a switch was made to run websites on tens to hundreds of commodity servers that allowed for a reduction in cost, more fault tolerance, and increased performance. The next switch was using virtual machines where one physical machine could be split into multiple virtual machines that can be independently managed. However, virtual machines typically have a high cost. For instance, each virtual machine is typically allocated hundreds of megabytes of RAM and typically takes tens of seconds to boot. Containers can be used to further provide isolation and are less resource intensive than virtual machines. But, web application code running in a container typically is run in its own OS-level process, consuming RAM and inducing context-switching overhead. Also, while native code can load quickly in a container, many server-oriented language environments are not optimized for startup time.
Some cloud computing platform process spin up a containerized process for your code and auto-scales the process which creates cold-starts. A cold-start occurs when a new copy of the code starts on a machine. A new containerized process is begun which can take between hundreds of milliseconds to multiple seconds (e.g., between 500 ms to 10 seconds). This means that any request may be hanging for as much time as it takes to begin the new containerized process (e.g., as much as ten seconds). Also, this containerized process can only process a single request at a time and a new containerized process must be cold-started each time an additional concurrent request is received. This means that a laggy request can happen over and over. Also, if the containerized process does not receive a request to process within a certain amount of time, it will automatically shut down and will need to be cold-started again once the request is received. When new code is deployed, this entire process proceeds again as each containerized process needs to be spun up anew.
One of the key features of an operating system is the ability to run many processes at once. The operating system transparently switches between the various processes that want to run code at any given time. The operating system accomplishes this through a context switch that moves the memory required for one process out and the memory required for the next process in. A context switch can take as much as 100 microseconds. When multiplied by all the processes running on the average cloud computing platform server creates a heavy overhead. This means that not all the CPU's power can be devoted to actually executing the customer code, but rather some is spent switching between the processes.
Most computing platforms are meant to be run by individual customers on their own servers. They are not intended to be run in a multi-tenant environment, executing code of other customers. Memory is often the highest cost of running a customer's code (even higher than the CPU cost).
Building and maintaining applications that easily scale to support spikes in demand or a global user base has generally required a large amount of both upfront engineering work and ongoing operational support. Developers are forced to spend significant time on writing supporting code rather than building the application itself. Many cloud computing platforms require the developer to specify where the code should run (e.g., at which nodes of the cloud computing platform), often with a small number of nodes that can be selected.