1. Field
The present disclosure relates generally to computer clusters and to data analytics system and methods for computer clusters. More particularly, the present disclosure relates to predicting the time for performing a data processing job on a cluster computer that takes into account stages for performing the job and for using the predicted processing time to configure the computer cluster.
2. Description of the Related Art
A computer cluster consists of a group of loosely or tightly connected computers that work together so that, in many respects, the computer cluster may be viewed as a single system. A computer cluster may be employed to improve performance and availability over that of a single computer. A computer cluster typically may be much more cost-effective than single computers of comparable speed or availability.
The components of a cluster computer are usually connected to each other through fast local area networks, with each node running its own instance of an operating system. In most circumstances, all of the nodes use the same hardware and the same operating system. However, in some setups, different operating systems, hardware, or both may be used for various computers in a computer cluster.
Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. The activities of the computing nodes may be orchestrated by a software layer that sits atop the nodes and allows users to treat that cluster a one cohesive computing unit.
A data analytics platform is an integrated platform providing the management of data as well as the ability to generate programmable analytics from the data. This platform may be made available as software only, packaged hardware and software, a virtual image, or in a cloud based software-as-a-service form. Analytics that may be performed may include statistics, predictive analytics, data mining, linear algebra, optimization, graphing, and others with complex mathematical operations and/or data transformation.
It may be desirable to use data analytics to predict the time to perform a data processing job on a computer cluster. It also may be desirable to use the predicted time for performing a data processing job to configure the computer cluster to perform the data processing job in a more economically efficient manner.
Accordingly, it would be beneficial to have a method and apparatus that take into account one or more of the issues discussed above, as well as possible other issues.