1. Field of the Invention
The present invention is directed toward scheduling of execution of queries in a database query processing system, and more particularly toward scheduling of batches of queries under conditions of shared aspects of the queries.
2. Art Background
The Internet can be considered a massive database. In the context of analysis of various content derived from the Internet and from behaviors of users on the Internet, it may be useful to perform analysis on the results of queries, filters, joins, projections, etc. In some cases, producing the aforementioned results may require queries to large datasets as well as operations on large datasets. In some cases, the execution of queries may require very large computing resources, and even then, may require many real-time hours to obtain results. Conventional scheduling techniques such as shortest-job-first do not perform well in many practical situations. Moreover, as measured by total throughput, conventional scheduling techniques perform particularly poorly in situations where multiple queries share intermediate results. Hence, it is desirable to find ways to best schedule such queries.