The relentless advances in computing power and storage capacity whose underpinnings are recognized as conforming to Moore's Law ensures that ever more data (both kinds and quantities) are being collected and made available for analysis. Hardware and software improvements raise the practical limits on data set sizes that can be examined and manipulated, but there is still a large and growing gap between “big data” and “interactively explorable data.” That is, while it is possible to execute queries and compute aggregate values over petabytes of data, the queries often take hours or even clays to run—one can obtain answers, but they are only useful if one knows the right questions before beginning. For exploring and investigating datasets—for learning the right questions to ask—faster query turnaround is essential.
Techniques for improving query performance on “medium-sized” datasets can increase the set sizes that can be explored interactively, reduce the hardware requirements for conducting data investigations, and/or answer more-complicated questions quickly. Such techniques may be of significant value in this field.