An increasing number of data-intensive distributed applications are being developed to serve various needs, such as processing large data sets that generally cannot be handled by a single computer. Instead, clusters of computers are employed to distribute various tasks or jobs, such as organizing and accessing the data and performing related operations with respect to the data. Various applications and frameworks have been developed to interact with such large data sets, including Hive, HBase, Hadoop, Amazon S3, and CloudStore, among others.
At the same time, virtualization techniques have gained popularity and are now commonplace in data centers and other environments in which it is useful to increase the efficiency with which computing resources are used. In some virtualized environments, one or more virtual machines are instantiated on an underlying computer (or another virtual machine) and share the resources of the underlying computer. These virtual machines include an operating system and one or more applications and processes to provide a particular operation, such as large scale data processing. However, although virtual machines may more efficiently share the resources of the underlying computer, virtual machines often require extensive overhead and memory that could otherwise be provided to the desired applications and processes executing thereon.