The cloud computing model has emerged as the de facto paradigm for providing a wide range of services in the IT industry such as infrastructure, platform, and application services. As a result, various vendors offer cloud-based solutions to optimize the use of their data centers. Modern cloud-based applications, irrespective of scale, are distributed, heterogeneous and can evolve rapidly in a matter of hours to respond to user feedback. This agility is enabled by the use of a fine-grained service-oriented architecture, referred to as a microservice architecture. A microservice is a web service that serves a single purpose, and exposes a set of APIs to other microservices, which collectively implement a given application. Each microservice of a microservice-based application is developed, deployed and managed independent of other constituent microservices of the microservice-based application. New features and updates to a microservice are continuously delivered in a rapid, incremental fashion, wherein newer versions of microservices are continually integrated into a production deployment. Microservice-based applications developed in this manner are extremely dynamic as they can be updated and deployed hundreds of times a day.
Microservice-based applications, running in the cloud, should be designed for, and tested against, failures. In the past, many popular highly available Internet services (which are implemented as a microservice-based application) have experienced failures and outages (e.g., cascading failures due to message bus overload, cascading failures due to database overload, cascading failures due to degradation of core internal services, database failures, etc.). The post-mortem reports of such outages revealed missing or faulty failure handling logic, with an acknowledgment that unit and integration testing are insufficient to catch bugs in the failure recovery logic.
In this regard, microservice-based applications should be subjected to resiliency testing, which involves testing the application's ability to recover from failure scenarios commonly encountered in the cloud, for example. However, splitting a monolithic application into microservices creates a dynamic software development environment that poses some key challenges to resiliency testing due to the runtime heterogeneity of the different microservices and the volatility of the code base. Indeed, microservice applications are typically polyglot, wherein application developers write individual microservices in the programming language they are most comfortable with. Moreover, a frequent experimentation and incremental software update delivery model results in microservices being constantly updated and redeployed, leaving the code base in a constant state of flux. This runtime heterogeneity and high code churn of microservices makes resiliency testing a microservice-based application highly problematic and non-trivial.