Online services and/or web-based applications may be accessed by network connected users, such as connected to the Internet. Some services may be used by a large number of users, and may utilize vast amounts of resources, such as hardware and software to provide the services to the users. These services may comprise multiple data centers, respectively comprising many (e.g., thousands of) servers and many (e.g., hundreds of) hardware/network components. Operations comprised on this scale may often have many small-scale, localized, partial outages, for a variety of reasons. Hardware outages may comprise storage disk failures, server crashes, network switches and/or other hardware failures, for example. Further, small-scale, localized software failures may occur that can affect a portion of users utilizing one or more of the services, for example.