A hosted application is a software application where the software resides on servers that are accessed through a wide-area network, such as the Internet, rather than more traditional on-premise software that is installed on a local server computer or on individual client computers. Hosted applications might also be known as Internet applications, application service providers (“ASPs”), World Wide Web (“Web”)-based applications, software as a service (“SAAS”), or on-line applications. Hosted applications typically provide services over a network, commonly referred to as a “cloud”, and are often used concurrently by multiple customers called “tenants.” Consequently, such applications are referred to herein as “multi-tenant cloud services.”
Multi-tenant cloud services currently exist for providing electronic mail (“email”) services, calendaring services, task management services, communications services, file storage services, customer relationship management (“CRM”) services, and many others. Large-scale multi-tenant cloud services such as these are commonly implemented using many thousands, or even tens of thousands, of server computers operating in one or more geographically disparate data centers. A large number of network services execute on the server computers to implement the multi-tenant cloud service. Additionally, such multi-tenant cloud services commonly require significant networking infrastructure (e.g. thousands of routers, switches, load balancers, etc.) in order to enable data communication between the servers and the network services that are executing thereupon.
During operation of a large-scale multi-tenant cloud service, such as those described above, it is commonly necessary to make changes to the hardware (e.g. servers or networking components) and the software (e.g. operating system updates, updates to the multi-tenant cloud service code, configuration changes, etc.) used to provide the service. These changes can impact the operation of the service, sometimes in unpredictable ways. For example, and without limitation, an update to the operating system on a server computer might render that server computer unable to process requests from clients of the service. Such changes can also impact the operation of upstream or downstream components, sometimes also in unpredictable ways. As a result, it can be difficult for an engineer to determine the source of an anomaly following a change to a software or hardware component in a large scale multi-tenant cloud service, such as those described above, that utilizes many thousands of servers, networking components, and software components.
It is with respect to these and other considerations that the disclosure made herein is presented.