Before the advent of cloud computing as a commercial service, distributed computing was almost entirely restricted to use within government agencies and scientific and educational institutions. Such organizations had an expectation of some amount of system or application downtime and it was expected that certain applications or components could or would routinely fail or require reboot or re-start.
In a commercial environment, however, such software and system outages can cost thousands or, in some cases, millions of dollars in lost revenue. It is therefore preferable to have a plan of action in place for dealing with expected or possible system or software failures before such failures occur. However, in large-scale settings such as cloud computing environments, it is not readily feasible to individually manage each computing resource and application atomically.