The present disclosure relates generally to query-processing computer systems and in particular to systems for automatically selecting among different query systems to process queries.
Large-scale data analysis can help businesses achieve various goals. For example, online services continue to proliferate, including social networking services, online content management services, file-sharing services, and the like. Many of these services are supported by large-scale computing systems including server farms, network storage systems and the like. Making sure the resources are matched to current and future user demand is an important part of efficiently managing an online service, and analysis of large quantities of data can help the service provider understand and anticipate trends in demand. As another example, sales or advertising businesses may amass customer data such as who visits which websites, for how long, and whether or what they purchase. Making use of this data, e.g., to develop or revise a marketing strategy, requires processing large quantities of data.
Analytics systems typically implement a general purpose query system that can perform basic searches and complex data manipulation on data stored in an underlying database. Analyzing large quantities of data is an iterative process. An analyst may be presented with a problem or question and write a query to analyze a data set. The analyst can submit the query to the query system which interfaces with the database and returns a result of the query to the analyst. Based on the result, the analyst may choose to modify the query or write a different query to address the problem. This iterative query development process is similar to the process a software developer goes through when developing code. However, queries across large data sets can take a long time to execute, forcing analysts to wait before they can determine whether any changes need to be made to the query. Short waits allow analysts to stay on task and complete the project over the course of several iterations. However, long waits leave analysts idle, or force them to work on different projects, making development less efficient as the analysts must switch between projects and forcing analysts to remember where they left off. This leads to less efficient data analysis projects.