The present disclosure relates to the organization of storage systems and improving storage performance characteristics.
Modern storage systems often use a storage volume to organize and manage information. The storage volume is a logical entity representing a virtual container for data or an amount of space reserved for data. While a storage volume can be stored on a device, they do not necessarily represent a single device. Typically, one or more portions of a storage volume are mapped to one or more physical devices. In many cases, these mappings can be fairly arbitrary and a device may contain parts if not all of several storage volumes. Likewise, a storage volume may be mapped to several devices. The Logical Volume Manager (LVM) is a tool used to manage the storage volumes on a system.
Increasing the homogeneity of information being stored on the storage volume can have many benefits. Computer systems accessing data frequently may experience better performance when the “hot data” is stored in a storage volume having faster access characteristics. Similarly, there may be little performance impact by grouping less frequented data or “cold data” on slower storage volumes that do not require the more expensive fast access features. Increasing the homogeneity of data also improves the ability to predict performance and deploy storage devices and hardware that better match the required performance needs.
Conventional systems rely on administrators to identify homogeneous information and then distribute it on one more storage volumes. Upon system setup, a person can initially allocate one drive to hold frequently accessed information and the second drive or storage volume to hold less frequently accessed information. Unfortunately, it is difficult and cumbersome to maintain this type of arrangement manually and keep the homogeneous data together. Not only is it difficult for administrators to identify data which is “similar”, but also over time, initial allocations of data may grow too large for the particular storage volume and/or different access patterns may emerge as usage behavior changes or are modified; this makes it impossible to estimate the mapping statically. Dividing groups of homogenous information onto distinct physical storage volumes manually is inflexible and not likely to be effective over any useful period of time.
Another approach is to automatically divide a storage volume into shards of homogenous data. The shard is a portion of the storage volume used to logically group together information having homogenous access characteristics. Different sharding combinations are used depending on the storage volume and access characteristics of the underlying data. Over time, different sharding combinations can be used to accommodate changing access patterns and make the data stored on a storage volume more homogeneous.
Identifying access patterns and homogenous data for the shard is difficult as two or more underlying data sets are rarely identical. In most cases, the access pattern for one block of data may be similar but not identical to another block of data. A pair wise similarity metric provides one method of grouping blocks of data together in a homogeneous shard. There are many other ways of detecting if one or more blocks are similar or dissimilar in nature to each other.
Unfortunately, there are many different possible divisions of the storage volumes into shards. Some divisions produce shards with more homogeneous access patterns than others. Currently, there is no effective and efficient method for determining an optimal sharding that provides the set of shards with the most homogeneous access patterns. It is both time consuming and processor intensive to consider every possible sharding of a storage volume.
Like reference numbers and designations in the various drawings indicate like elements.