1. Field of the Invention
Embodiments of the present invention relate, in general, to storage systems and particularly to dynamic classification of data maintained within a plurality of storage systems.
2. Relevant Background
The storage of data occurs on many mediums including flash drives, magnetic disks, magnetic tape, optical disks and the like. Each medium is associated with an initial cost to procure the medium and operational costs to store and retrieve data. These expenses, combined with differing performance characteristics such as access speed, have driven the industry to adopt a tiered storage system.
A tiered storage system, as is currently typical in the art, places new data or data which is likely to be in high demand on a first tier. As data ages or becomes less important, it is shifted to a second, third or lower tier as appropriate. Each lower tier is typified by slower access time and lower cost associated with storing data. Thus a typical three tier storage system may have as a first tier a certain amount of flash memory. Flash memory is, in comparative terms, expensive per byte of storage capacity. Flash memory also offers extremely fast access to the data. Thus tier one is characterized by a limited capacity of quickly accessible, expensive data. Eventually data that resides on the flash will be replaced by other, more important data. The replaced data is then likely moved to a lower tier in the storage architecture.
The second tier generally has a larger storage capacity than the first tier, is somewhat slower with respect to accessibility and is cheaper. In this example, the second tier is comprised of magnetic disks. According to the storage system of the prior art, once data is identified as having a higher priority for tier 1 space than the data currently residing in the flash memory, the data currently on tier, is moved into tier 2 thus providing space for the new data. Assuming that there is space in tier 2, no other data needs to be removed to make room for the new arrival. However, there remains a cost for keeping data available in tier 2 storage. The disks and the facilities must be maintained, and in many circumstances this overhead is significant.
Thus a third tier of storage exists in which data that is likely not to require immediate or even quick access can be placed. Generally tier 3 is comprised of magnetic tape. Magnetic tape requires a low initial investment but does possess a considerable latency with respect to data access. In many circumstances, however, a business may wish to archive data. The decreased cost of this storage makes high access latency an affordable tradeoff for such data.
In a tiered storage system as described above, data is constantly moving. Data that is no longer worthy of tier 1 storage is copied to tier 2. Data in tier 2 that has not been accessed for a prescribed period of time is moved to tier 3. Data that is required for analysis is retrieved from tier 3 and placed in tier 2 or tier 1. This is compounded by the fact that within each tier there may be additional classifications. For example in tier 2 of the previous example using magnetic disks, data stored on the outside of the disk inherently possesses better access time than data stored near the spindle. Thus that data may be at tier 2.1 while other data may be designated 2.x.
Finally, associated with each storage tier is a bandwidth cost. To move or access data a certain amount of bandwidth must be utilized. Assuming there is a finite amount of bandwidth for a particular system, the bandwidth used to transfer data cuts into the bandwidth needed to access and use the data. Typically storage mediums operate at a maximum setting. When a piece of data is accessed, it is accessed and transported at the maximum rate at which the device can physically operate. However, as systems have evolved such a maximum effort is not always necessary. A challenge therefore exists to balance the cost of storing data with that of accessing the data.