1. Field of the Invention
The present invention relates generally to an improved data processing system, and in particular, to a computer implemented method, system and computer program product for improved clustering of analytic functions in the performance of data analysis.
2. Description of the Related Art
Present data processing environments include a collection of hardware, software, firmware, and communication pathways. Management, administration, operation, repair, update, expansion, or replacement of elements in a data processing environment relies on data collected from various points in the data processing environment.
Furthermore, the various elements of a data processing environment often include components of their own. Various systems, applications, or functions may collect data at or about the various components. For example, a management system may collect data from components to gain insight into the operation, control, performance, troubles, and many other aspects of the data processing environment.
Each element or component can be a source of data that is usable in this manner. The number of data sources in some data processing environments can be in the thousands or millions, to give a sense of scale.
Furthermore, not only is the data collected from a vast number of data sources, a variety of data analyses often has to be performed using various analytic functions on a combination of such data. A software component or another element of the data processing environment may implement an analytic function to perform a particular analysis. Many instances of similar functions may simultaneously execute to analyze similar data from different sources or similar data pertaining to different resources. In some data processing environments, the number of analytic function instances can range in the millions.
Additionally, a particular analysis may be relevant to a particular part of the data processing environment, or use data sources situated in a particular set of data processing environment elements. Consequently, the various functions performing the analyses may be distributed across the data processing environment, such as to be close to their respective data sources. Analytic functions may further communicate and interact with each other to provide certain analysis or information.