Modern data centres offer many distinct information technology (“IT”) services to many different groups of users with highly variable needs. If each service is deployed on a distinct set of IT resources, the average utilization of those resources tends to be low. Here, a “resource” might be any physical or logical entity that is used in the delivery of a service (e.g., CPU cycles, memory, storage, bandwidth, software licenses, data sets, etc.) An alternative is to deploy the requisite set of services on a shared fabric of resources. Indeed, several interrelated fields of computer science—service oriented architectures (“SOA”), grid computing, utility computing and web-services—all address strategies for more effectively delivering multiple services by sharing IT resources between them.
Common methods of providing an adaptive SOA can involve managing or “orchestrating” the allocation of a shared set of resources between different services. Yet, frequently, such methods of “resource orchestration” have serious practical shortcomings. For instance, significant delay may be necessary before a resource reallocated from one service to another can actually be used. It is often necessary to download, install, configure and activate new software applications, a different operating system and requisite data sets before a reallocated compute node can be used. In cases where business continuance is a priority, the resulting delay can have serious consequences. Another shortcoming of conventional resource orchestration is that it can seriously undermine the stability and optimization of clustered systems, which frequently exhibit complex and subtle resource dependencies. Further, resource orchestration may present serious security challenges that would not otherwise arise. Resources subject to reallocation may transfer between different trust domains and the services to which they are assigned may have different access privileges (e.g., to protected data sets or network zones).
To better understand the challenges associated with conventional methods of resource orchestration, consider a specific example. Suppose that an e-mail service and clustered file system service are both deployed on a shared SOA fabric. Suppose that under normal circumstances the e-mail service deploys two physical hosts as e-mail servers: one of which is active and the other is a hot stand-by. Meanwhile, the clustered file system is normally deployed on three active physical hosts. There are no other servers available in the SOA fabric.
Now suppose that one of the hosts for the clustered file system fails and the resource orchestration policy dictates that the stand-by e-mail server be reallocated to the clustered file system service. The reallocation triggers a complex sequence of events.
First, the stand-by server must be decommissioned as part of the e-mail service. This may provoke changes in the configuration of the e-mail server cluster, deactivation of a higher-level cluster failover service, changes in the configurations of network zones and perhaps several other changes as well. Next, the reassigned server must be commissioned, configured, and activated as part of the file system cluster. This may necessitate: provisioning a new operating system, downloading and installing file system software, registering the server as part of the file system cluster, mounting new volumes associated with data and metadata for the file system, setting new policies for cluster fail-over, quorum, network partition management, and nomination of stand-by master servers for the cluster, reconfiguring host bus adapter multi-path and failover policies, and re-configuring network zones, port-bindings, LUN masking settings, and routing tables.
Clearly, this complex sequence can leave the SOA vulnerable to a host of system-level challenges. Further, throughout the time that the host is being reassigned from one service to the other, the entire SOA is in even a more severely degraded mode of operation than resulted from the original failure condition. Specifically, during the de-commissioning and re-commissioning process, two servers are out of active duty rather than just one.
Often the costs associated with dynamic resource orchestration outweigh the benefits. In consequence, many organizations continue to offer information services on static “silos” of physically or logically isolated information technology (“IT”) systems. Such static architectures cannot respond to the changing conditions of a modern data centre and, hence, are often sub-optimal.
It is, therefore, desirable to provide an architecture and methodology for manipulating pre-configured virtualized system containers in such a way that an SOA can adapt to changing environmental conditions without any need for reallocating resources across container boundaries.