The present invention relates to data analytics platforms, and more particularly, this invention relates to vertical tuning of distributed analytics clusters in cloud storage systems and networks.
Traditionally, to run a data analytics application, one or more virtual machines or containers are first provisioned in a cloud. Subsequently, a data analytic platform, such as, for example, Hadoop or Spark, is deployed on the virtual machines or containers. The data analytics application is then run on top of the platform. Parameters configured for the data analytics platform, as well as parameters configured for the virtual machines or containers, impact performance of the data analytics application.
Currently, parameter tuning of data analytics platforms suffers from many drawbacks. For example, resource provisioning tends to be a coarse-grained approach that first allocates a number of virtual machines, and then runs a data analytic platform on top of those virtual machines, without considering, when provisioning the virtual machines, characteristics of the workload running on the data analytics platform. In other words, there is no joint consideration of the various layers during configuration.