Solid state storage, in particular, flash-based devices either in solid state disks (SSDs) or on flash cards, is quickly emerging as a credible tool for use in enterprise storage solutions. Ongoing technology developments have vastly improved performance and provided for advances in enterprise-class solid state reliability and endurance. As a result, solid state storage, specifically flash storage deployed in SSDs, is becoming vital for delivering higher performance to servers and storage systems, such as the data warehouse system illustrated in FIG. 1. The system illustrated, a product of Teradata Corporation, is a hybrid data warehousing platform that provides the capacity and cost benefits of hard disk drives (HDDs) while leveraging the performance advantage of solid-state drives (SSDs). As shown the system includes multiple physical processing nodes 101, connected together through a communication network 105. Each processing node may host one or more physical or virtual processing modules, such as one or more access module processors (AMPs). Each of the processing nodes 101 manages a portion of a database that is stored in a corresponding data storage facility including SSDs 120, providing fast storage and retrieval of high demand “hot” data, and HDDs 110, providing economical storage of lesser used “cold” data.
Teradata Virtual Storage (TVS) software 130 manages the different storage devices within the data warehouse, automatically migrating data to the appropriate device to match its temperature. TVS replaces traditional fixed assignment disk storage with a virtual connection of storage to data warehouse work units, referred to as AMPs within the Teradata data warehouse. FIG. 2 provides an illustration of allocation of data storage in a traditional Teradata Corporation data warehouse system, wherein each AMP owns the same number of specific disk drives and places its data on those drives without consideration of data characteristics or usage.
FIG. 3 provides an illustration of allocation of data storage in a Teradata Corporation data warehouse system utilizing Teradata Virtual Storage (TVS). Storage is owned by Teradata Virtual Storage and is allocated to AMPs in small pieces from a shared pool of disks. Data are automatically and transparently migrated within storage based on data temperature. Frequently used hot data is automatically migrated to the fastest storage resource. Cold data, on the other hand, is migrated to slower storage resources.
Teradata Virtual Storage allows a mixture of different storage mechanisms and capacities to be configured in an active data warehouse system. TVS blends the performance-oriented storage of small capacity drives with the low cost-per-unit of large capacity storage drives so that the data warehouse can transparently manage the workload profiles of data on the storage resources based on application of system resources to the usage.
Systems for managing the different storage devices within the data warehouse, such as TVS, are described in U.S. Pat. No. 7,562,195; and United States Patent Application Publication Number 2010-0306493, which are incorporated by reference herein.
To achieve optimal performance, TVS steadfastly attempts to place data with the highest number of accesses (hot data) in locations with the fastest average response times. This works optimally in systems with steady-state allocation/de-allocation patterns since, on average, de-allocations are expected to occur more frequently in locations ideal for future allocations. However, many systems do not exhibit this behavior. In these systems, data is rarely de-allocated, leaving few ideal locations for future allocations.
One of the most important, and most frequently accessed, data structures employed by a Teradata database system is spool space, also referred to as a spool. All users who run queries need workspace at some point in time. This spool space is workspace used for the temporary storage of rows during the execution of user SQL statements.
A spool is very short-lived, but critical for performance. By its very nature, a spool is allocated when needed and subsequently freed. Because of TVS migration of hot data to the fastest locations, there is generally little to no space available in ideal locations as those locations have been populated by extents as the result of migration, preventing TVS from allocating spools to the most ideal (fastest) locations.