Computing system hardware and software are both prone to error. Such errors may be due to various causes, for example data corruption, hardware malfunction, or software errors. Such errors may be correctable, meaning that an operating system or executing software can recover from such errors and continue operation, or uncorrectable, in which case the computing system itself is incapable of continuing operation.
To address such errors, owners of such computing systems who require system reliability typically obtain that reliability through some combination of hardware redundancy to ensure availability of a computing system in the event one malfunctions, due to the unlikelihood of concurrent system errors across the redundant hardware systems. However, increasingly, as third-party server computing systems (e.g., cloud computing arrangements) become more relied upon, it becomes more important to be able to rely on such computing resources being available despite not necessarily having control over the full extent of hardware redundancy (which is selected and implemented by an owner of the computing resources).
In addition to error concerns, server systems are often overloaded with workloads over time ad computing needs of an organization change. Again, since increasingly third party providers deliver computing services, customers of such services have less control over the extent to which, and the manner in which, workloads can be transferred among computing systems. Furthermore, even those third party computing providers may implement computing system availability on a system-by-system basis or platform-by-platform basis, which limits their flexibility to allocate tasks across computing systems.
Existing systems that employ distributed and continuous computing concepts, and which do not solely rely in hardware redundancy, utilize computer system virtualization concepts to improve computing flexibility. Computer system virtualization allows multiple operating systems and processes to share the hardware resources of a host computer. Ideally, the system virtualization provides resource isolation so that each operating system does not realize that it is sharing resources with another operating system and does not adversely affect the execution of the other operating system. Such system virtualization enables applications including server consolidation, co-located hosting facilities, distributed web services, applications mobility, secure computing platforms, and other applications that provide for efficient use of underlying hardware resources.
However, existing virtualization systems have drawbacks. Generally, many such systems virtualize an entire operating environment within a specific, allocated partition, and provide little to no access to that operating environment to external software. Accordingly, it can be difficult to migrate workloads to/from such operating environments. Furthermore, existing virtualization systems are typically constructed to provide a substantial disconnection between a structure of an underlying hardware system and the hardware seen by the virtualized software. That is, a virtualization system may host a partition that includes an operating system that sees a processor and a predetermined amount of memory. In such a scenario, that processor, or memory, may be shared with other partitions, such that the partition may only receive a time-divided portion of the overall processing or access time of that resource. For critical software workloads, this represents a substantially sub-optimal scenario, since the partition hosting that critical workload cannot indicate that the workload is critical or otherwise requires some special attention. Furthermore, it may be difficult to, in cases where a workload is not initially critical but becomes so during operation, offload other workloads from the partition hosting that critical workload.
Moving workloads among computing systems introduces numerous challenges, regardless of whether physical or virtualized systems are used. For example, in the case of data storage, a workload may originally be located on the same system where associated files or other data are stored; however, if the workload is migrated to another system and the data is not, the manner in which the data may be accessed typically changes. For example, local data may be accessed via a data bus of an I/O subsystem, while remote data may require access via a communication interface. As such, an operating system of the physical or virtualized subsystem typically would need to be able to handle I/O operations irrespective of a location of the data being retrieved. Furthermore, in cases where only a portion of a workload is offloaded to a different computing system, the workload itself cannot easily be modified to address both local and remote memory access scenarios or local/remote I/O operations. Accordingly, the underlying system, such as the operating system or virtualization system, would need such insight into the portability of the hosted workload. Due to such complexities, portability of workloads is not easily attempted or implemented, particularly in virtualization systems, which can themselves be ported to different systems instead.
For these and other reasons, improvements are desirable.