Due to large volume of data, big data paradigm has resulted in impacting performance of an application. The performance of the application has been evaluated based upon variety of parameters. One of the critical parameter that facilitates to evaluate the performance is a response time of a query. For a structured database application, the response time of the query is tested against a subset of the large volume of data deployed in a production database. It is generally observed that sometimes the response time in such database application may increase non-linearly with increase in size of the data over a period of time. The non-linear increase in the response time may in turn lead to violation of the performance guarantee provided to users of the application.
This non-linear increase in the response time is because of conventional testing techniques being adopted for testing the queries. The conventional testing techniques involve testing the queries on the subset of the large volume of the data. In order to evaluate the performance using the conventional testing techniques, the application may require various resources such as storage servers capable of storing trillions of records. But the deployment of such resources for evaluating the performance may lead to incur huge cost. Moreover, even if the resources are arranged, testing the queries against the large volume of the data may increase evaluation time and thereby delaying deployment of the application.
In order to overcome the aforementioned lacunae, various statistical machine learning based models have been proposed in the art which builds a learning base using past queries and then predict the response time of the query based on nearest neighbor approach. But such models lack in accurately predicting the response time of the query with linear increase in the size of the data over the period of time.