Big data engines like Hadoop, Spark and Presto may generally work well when the machines of which they are composed are similar in processing power. This may allow a central resource manager to optimally or advantageously place workers on the nodes, as well as may provide users with an ability to tune jobs according to the configuration of a single machine. This model may generally work well when a user has complete control over the resources he or she can purchase and use. However, this model may begin to be disadvantageous if the user is sharing resources with thousands of other users in the cloud. For example, a user may often see that the type of machine desired by the user is not available. This problem may be exacerbated when a user has multiple purchase options, such as in Amazon Web Services (AWS) or similar systems, where the user can purchase spot instances at a tenth of the normal price of a machine. In such a case, a user may want to utilize the cheapest comparable instance rather than a more expensive instance.
Accordingly, there is a need for solving problems associated with provisioning and using heterogeneous clusters composed of varying types of instances used in cloud based big data services.