Change Management is central to ensuring the availability, reliability, and quality of information technology (IT) services. Change Management is the process by which Information Technology (IT) systems are modified to accommodate considerations such as software fixes, hardware upgrades and performance enhancements. Examples include changing the schema of a database table in a running application and installing a new release of a web application server in a multi-tiered eCommerce system. The importance of change management is underscored by recent studies showing that operator errors account for a large fraction of failures of Internet services.
Central to Change Management is the Change Plan, the step-by-step procedure whereby a proposed change is implemented (e.g., Chou et al., Software—Practice and Experience v 30 n 3 2000, pp. 175-197) by modifying the various artifacts of the system. Examples of artifacts include programs, database tables, and initialization files. This is illustrated by two services, Order Display and Buy Confirmation, that use the credit card transitions database table, CC_XACTS. When there is a Change Request to modify the schema of the database table (e.g., to accommodate new accounting procedures), the Change Request describes the end result of the change, such as having successfully modified the table schema. The Change Plan specifies times at which various tasks execute to transition from the current state of the system to its desired state. In this example, modifying the database schema requires installing a new version of the servlets Ordrdisp and Buyconf that implement these services.
FIG. 1 displays a Change Plan for the Change Request in this example. Artifacts (either a database table or a servlet in this example) transition between different stages in their lifecycle: installable, executable, and running. For a service to be operational, all of its artifacts must be in the running state. To change an artifact, however, it must be transitioned first to the executable state and then to the installable state. The tasks in the plan are indicated by the lifecycle transition that they cause. The duration of the task is indicated by the length of its containing rectangle. For example, in the figure, Buyconf begins its transition from running to executable at time 0, and completes the transition at time 3.
The discussion will now focus on two parts of Change Management. The first is construction of the Change Plan. The Change Plan is critical to effective change management since it determines the state of artifacts and hence the impact on service delivery. As such, it is often desirable to provide impact analysis, whereby the effect of the Change Plan on services is determined. In particular, impact analysis indicates which artifacts and services are affected by a change.
Some kinds of automation for Change Management are considered in the current state-of-the-art. For example, (1) Change Plan execution may be automated using workflow (e.g., Maurer et al., IEEE Internet Computing, May-Jun. 2000, pp. 65-74), (2) the incorporation of software updates can be automated (e.g., U.S. Pat. No. 6,385,768, Ziebell, “System and method for incorporating changes as part of a software release”), and (3) versions and configurations can be managed for both persistent and transient objects (U.S. Pat. No. 5,862,386, Joseph et al., “Apparatus and method for providing a facility for managing versions and configurations of persistent and transient objects”). However, the construction of the Change Plan itself requires human intervention.
The current state-of-the-art is also limited as to impact analysis. Today, the focus is on identifying the affected artifacts and services (e.g., U.S. Pat. No. 6,601,023, Deffler et al., “Method for impact analysis of a model”). It is much more valuable, however, to refer to the duration of service outages caused by a change, or, more generally, to the cost of a change. These considerations are typically a function of the start and end time of service outages or degradations. For example, having an order entry application offline for two hours may not be a problem during a time when few people are shopping (e.g., Christmas Day) but may have a major impact at other times (e.g., the day before Christmas).