The microservice and DevOps approach to software design has resulted in new software features being delivered immediately to users, instead of waiting for long refresh cycles. It should be noted that while these approaches are primarily discussed herein with respect to cloud applications, both microservices and DevOps are more generally applicable to development of a wide variety of software applications. Under the conventional “waterfall” development methodology, improvements (e.g., new features, performance improvements, bug fixes, etc.) are periodically delivered as one big update. For each update, the cycle of Plan, Develop, Test, and Deploy can take months to years.
However, developers and system administrators are adopting new software practices which stress communication, collaboration, integration, automation, and measurement of cooperation between software developers and other information-technology (IT) professionals. Such practices have collectively come to be termed as DevOps, which is a clipped compound of “development” and “operations.”
Application developers and operations personnel employ agile software practices such as DevOps to reduce the time to market new features and improvements to the application. DevOps includes continuous integration and deployment of new features and incremental updates. The frequency of such deployments varies anywhere from a few times a week to fifty times a day. Therefore, in DevOps, the aforementioned cycle of Plan, Develop, Test, and Deploy only spans hours to days.
User feedback is constantly monitored and incorporated. Developers and operations personnel work closely to quickly test and deploy new features, monitor user experience, and finally incorporate changes based on monitoring into the next iteration. The tight feedback loop created by this process enables software to evolve faster in response to user needs.
In order to continuously incorporate user feedback, application owners now choose to release early and release often, foregoing rigorous testing in exchange for hastening the time to market a product features. Thus, this rapidly evolving application deployment can increase the occurrence of software bugs and performance regressions, which therefore become an important cause of downtime.
There are many performance profiling tools to help developers identify software issues causing degradation in throughput and response times to the end user. However, they provide only monitoring capabilities. There are also design patterns like circuit breakers that enable distributed applications to prevent transient and component-level errors from cascading.
Persistent failures, those stemming from software bugs and under-tested features, continue to require human intervention. Application availability is impacted because debugging modern cloud-based distributed applications is often a time consuming task. While the common web application has now become a distributed resilient application, the complexity of troubleshooting issues has also spread from a single machine to a deployment spanning data centers. The mean time to recover (MTTR) an application from a performance bug depends on how quickly the human operator responds to the issue. If the software issue arises at an hour when the developer/administrator is not available (e.g., midnight), the time to repair (and hence recover) the application is high, resulting in loss of traffic and revenue.