A wide variety of different types of data storage systems are known, including, by way of example, tiered storage systems, cloud storage systems and storage systems of virtual data centers. These and other data storage systems generally comprise one or more storage arrays, each comprising multiple hard disk drives, solid state drives or other storage devices. Such data storage systems often further comprise additional related entities such as database management systems or data stores.
In conventional practice, an external database management system or data store is commonly utilized to process queries relating to data stored in an underlying storage array. For example, analytic queries may be generated against an analytic data store to harvest data periodically from an underlying storage array.
Accordingly, in these and other conventional arrangements, query processing is typically performed entirely outside of the storage array. Such query processing can involve computations that require multiple round trips between an external query processing engine and the storage array, and considerable data movement before predicates, joins, and other query processing operations can be applied to the data. As a result, under current practice analytic queries against medium or large amounts of data stored in a storage array tend to execute slowly and consume excessive input-output (IO) capacity.
This deficiency of the prior art has generally precluded the execution of analytic queries against production online transaction processing (OLTP) systems for fear of the impact on production. Instead, data is laboriously and repeatedly extracted to warehouses, marts, and other destinations for analysis. This introduces latency, raises consistency and data freshness issues, and increases costs. Conventional processing of analytic queries is similarly deficient in numerous other data storage system contexts.