In today's networked communication environments many services that used to be provided by locally executed applications are provided through distributed services. For example, email services, calendar/scheduling services, and comparable ones are provided through complex networked systems that involve a number of physical and virtual servers, storage facilities, and other components across geographical boundaries. Even organizational systems such as enterprise networks may be implemented through physically separate server farms, etc.
While distributed services make it easier to manage installation, update, and maintenance of applications (i.e., instead of installing, updating, and maintaining hundreds, if not thousands of local applications, a centrally managed service may take care of these tasks), such services still involve a number of applications executed on multiple servers. When managing such large scale distributed applications continuously, a variety of problems are to be expected. Hardware failures, software problems, and other unexpected glitches may occur regularly. Attempting to manage and recover from such problems manually may require a cost prohibitive number of dedicated and domain knowledgeable operations engineers.