Technical Field
The present disclosure generally relates to deployment and termination of different versions of code in application instances in cloud computing systems.
Description of Related Art
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Rather than relying on a single large software application to provide every facet of a modern software solution, many software solutions today are made up of a substantial number of different services that are designed to work together to provide the functionality of the overall system. For instance, rather than writing a single standalone application that provides an online content streaming service, such a service could be provided by tens or even hundreds of smaller software services, each designed to perform a specific set of tasks, and that work together to provide the content streaming service. Doing so has several pronounced advantages. For instance, it can be easier to compartmentalize the development of the software application, as each standalone service can be assigned to a small group of programmers for implementation. Additionally, doing so greatly improves the modularity of the software solution, allowing individual services to be easily removed and replaced with updated services that perform the same task. As yet another advantage, such a modularized design allows the software solution to be easily distributed and redistributed over multiple different compute nodes (either physical or virtual), based on how the different services are positioned and configured.
However, it can potentially be difficult to pinpoint the root cause of a problem in a heavily distributed software solution. For example, consider a solution made up of several hundred interconnected services. In such an environment, a problem occurring in one of the services may adversely affect the performance and/or quality of several other services, which in turn may adversely affect the performance and/or quality of still other services. When this occurs, the developers and engineers may have difficulty pinpointing which of the many malfunctioning services originally caused the problem. As another example, when a particular service begins consuming a large amount of system resources, it may be difficult to determine whether an update to the particular service is causing the heavy resource usage, or whether an update to another one of the services is causing the heavy resource usage. In this context, “performance” refers to any aspect of a service that indicates the service's health and quality, including metrics such as the rate of errors that a service generates.
Additionally, the vast majority of software applications go through a number of different iterations during their lifespan. For instance, a newer version of a software application could add a new feature to the software application. As another example, the newer version could attempt to resolve a problem with the previous version of the software application. As a practical matter, newer versions of software typically include a multitude of different changes and new features. Furthermore, the newer version of software may frequently be developed by a substantial number of developers, with one or more developers working on each of the changes and new features, and then merging their individual contributions into a single release version of the software.
However, since software development is not a perfect science, the newer version of software may introduce new problems as well. Such problems could be caused by any number of factors, including incompatible code introduced during the merge process, mistakes during the merge process, or simply errors in the code. While these problems could cause the new version of software to fail during execution, in other situations these problems could affect the performance and/or quality of the software application (e.g., resulting in higher memory and CPU usage during execution), and thus may be harder to detect during quality assurance testing. Administrators need to be able to identify functional regressions that actually occur, because simulating performance in test environments is necessarily imperfect. In an environment in which a number of interrelated services are executing and in which the performance and/or quality of a particular service can be influenced not only by the particular service's personal workload, but by the performance and/or quality and workload of other services as well, it can be significantly harder to detect minor differences in the performance and/or quality of a newer version of one of the services.