Big data analytics systems utilize a multiplicity of models, resulting in substantial computational and maintenance costs. Few users are able to afford the cost of deploying and maintaining a complete set of targeted models using existing approaches. Some illustrative approaches include model clustering on a model parameter space, data clustering, and prediction by clustering. Model clustering trains a plurality of models to estimate one or more parameters for the models, and then performs clustering. However, a large number of models must be trained, and the training process is computationally expensive.
Conventional data clustering methods are geared to performing data clustering on a data vector space, and are not configured for solving forecasting problems. Another conventional approach, prediction by clustering, performs clustering of data on a data vector space, and then builds a predictive model for each cluster. In some cases, prediction by clustering provides improved accuracy relative to other approaches. However, a high-dimensional data vector space is required. Data in this vector space is sparse, including many irrelevant and noisy features. Moreover, high dimensionality may result in a clustering that makes no sense. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.