A conventional data storage device contains an array of disk drives for data storage, a controller for controlling access to the disk array, and a cache memory. The cache memory is used for storing recently accessed data so as to provide quick access to data that is likely to be accessed in the near-term without having to access the disk on every occasion. When a data access request is received, the storage device first attempts to satisfy the request using the cache, before using the disk array. For example, when a READ operation is referencing data that is already in cache, the data will be returned directly from the cache. For WRITE operations, the data is written into the data cache, replacing previous versions of the same data, if any, within the cache. Since a particular file or block of data may be located on the disk or in the cache, the storage device typically includes Metadata (MD) that registers all data blocks currently in the cache and, therefore, indicates whether a data block is on the disk or stored in cache. If the data block is in the cache, the MD indicates where the data block is stored in the cache. The MD also indicates the current state of the data block (i.e., whether or not it has been “flushed” to disk).
Since fast access is required to both the data cache and the MD store, both are typically stored in random access memory (RAM). Because it is important that the data cache and the MD store not be lost in the event of an unexpected power failure, the RAM is typically non-volatile RAM (NVRAM). Because NVRAM is expensive, only a limited amount is available in a storage device. This means that the more NVRAM is used to store MD, the less is available for actual data.
Typically, the data cache is divided into fixed size ‘slots’, and the MD store is divided into fixed size ‘entries’. In conventional design, there is typically a one-to-one matching between slots and entries. Typically the MD may be organized as a table with an implicit association (direct mapping) between the MD entries and the data cache slots. That is, each MD entry is statically associated to a particular data cache slot, and the data block relating to a MD entry is implicitly contained in the data block slot thus associated with the entry. Alternatively, the MD may be organized in a fully associative manner in which each MD entry in the table also includes a pointer to an arbitrary data cache slot. When a data access request for a particular data block is received at the storage device, the array controller looks in the MD structure to find an entry that contains the block address. The entry contains the pointer to the data cache slot containing the corresponding data block.
Such organization for the MD has a substantial drawback in that because the MD requires a fixed size (i.e., an entry for each data cache slot), the array controller cannot dynamically divide the NVRAM between the data cache and metadata according to application need.
The process of locating a given block address in the MD structure is typically done in one of the two ways. One is that, the controller simply searches through the MD table entries until it finds a match. This method may present performance problems because it may require searching a large number of MD entries. The other method of locating a given block address in the MD structure employs a hash function to map groups of block addresses into particular metadata entries. Each block address can be mapped to exactly one entry, but multiple addresses can be mapped to the same entry. A block address field within the entry determines the actual data block being represented by the entry. In the case where the hash function maps every block address to a different entry, a direct mapping results. The hash function approach can result in conflicts, where multiple heavily-used block addresses that happen to be mapped to the same MD entry keep forcing their corresponding data blocks to be evicted from the data cache (because the MD entry can only describe one particular data block at any time), even if there is plenty of free space left in the data cache. A direct-mapped hash function eliminates such conflicts, but can waste a lot of metadata store, since a entry must be reserved for each VBA at all times, regardless of whether it is ever used or not.
Typical storage devices divide a disk into a number of discrete storage areas known as virtual logical units (VLUs) each of which supports an independent virtual block address (VBA) space. Therefore every user data block in the array is uniquely identified by referenced to a particular VLU and a VBA. The MD structure must therefore include VLU information to be able to support such multi-VLU configurations.
This need may be facilitated by logically dividing the MD store into separate tables for each VLU. Given a block address (VLU#, VBA), the array controller performs the MD lookup in the appropriate partition of the MD store. However, partitioning the MD store can result in inefficient use of the NVRAM. For example, a busy VLU cannot make use of the MD entries (and, consequently, the one-to-one-matched data cache slots) allocated to idle VLU's.
Alternatively, the MD store need not be partitioned, but the entire MD store may be included in one table where each may represent any user data block from any VLU. In such case, the lookup function may be based on a combination of both the VLU# and VBA. If such an implementation employs a hash function, it may suffer from another kind of conflict where the same VBA from different VLUs force one another out of the cache.