A big data framework is a distributed data processing architecture which provides distributed storage for large amounts of data, and facilitates the ability to perform various actions and functionalities concerning the data by bringing the program to the location where the data resides, as opposed to the more traditional model of brining the data to the program. As working data sets become ever larger, this so-called “big data” model makes more and more sense, and is becoming ever more widely used. A big data framework can include an entire collections of components such as a distributed, location-aware file system, a job scheduler, a resource management platform, a coordination service for distributed applications, a scalable, distributed database with support for large tables, etc.
The current canonical example of a big data framework is the Apache™ Hadoop® software library. Hadoop allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is configured to scale-up from single servers to thousands of machines, each offering local computation and storage. Hadoop is designed to detect and handle failures at the application layer, thereby delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Although Hadoop is widely deployed today, there are also other big data architectures in use, such as FICO Blaze Advisor, HP Vertica, Splunk, etc.
In the virtualization of computing devices, one or more virtual machines (VMs or guests) can be instantiated at a software level on physical computers (host computers or hosts), such that each VM runs its own operating system instance. Just as software applications can be run on physical computers, so too can applications be run on virtual machines. In some virtualization scenarios, a software component often called a hypervisor can act as an interface between the VMs and the host operating system for some or all of the functions of the VMs. In those situations, the hypervisor acts as an interface between the VMs and the hardware of the host computer, in effect functioning as the host operating system, on top of which the VMs run. Even where a host operating system is present, the hypervisor sometimes interfaces directly with the hardware for certain services. Different virtualization platforms run on different host hardware and provide virtualization utilizing different hypervisors configured to support varying virtualization capabilities for VM images in different formats. Under a given virtualization platform (e.g., VMware, Microsoft Hyper-V, VirtualBox, etc.), a VM in a format supported by the platform can be brought up on an underlying host running the platform's a hypervisor.