Data management systems are used to automatically control the storage of data by automating the placement of data according to various criteria. Data placement in such systems comprises a decision on which one of a plurality of storage units the data should be stored for minimizing the costs without causing the quality of data provisioning services based on said data to decline significantly. It must be ensured that the data can be reliably and quickly provided to one or more clients, wherein the speed and the reliability of the data transfer and of other service criteria may be specified in service level objectives (SLOs). Several existing data management systems as disclosed, for example, in U.S. Pat. No. 7,949,847 B2 use virtualization technology to give, in a process known as ‘thin provisioning’, the appearance of more physical resources than are actually available for reducing the costs.
Information Lifecycle Management (ILM) systems apply ‘rules’ or ‘policies’ to automatically manage data, associate it with metadata of various kind and provide it to different users or user groups.
Hierarchical storage management (HSM) systems automatically move data, in particular files, between high-cost storage devices typically having short access times and low-cost storage media typically having longer access times to provide for an optimal compromise between data provision speed and cost.
One drawback of today's HSM implementations is that they often rely on a single type of file system as they require support in the file system for hiding from the user the actual physical location of the stored data. Not every piece of data are, however, organized as an individual file system node, and different storage devices may comprise file systems of different types (ext2, ext3, FAT32, NTFS or the like) which may be incompatible with each other.
Data management systems being based on tiered storage (as provided e.g. by IBM's EasyTier storage system) classify the available disc drives into two or more kinds of storage devices in dependence on attributes such as price, access times, storage capacity and function (e.g. different RAID levels or replication). Storage devices of similar type may be assigned to a common storage tier. Thus, data which is only rarely accessed may be stored in a storage tier consisting of cheap storage devices with slow access time while heavily used data may be stored in another storage tier comprising more expensive hard drives.
One major deficiency in prior art data management systems which are based on tiered storage is that for deciding whether to migrate a volume's data from a storage volume in one storage pool to another volume of another storage pool (or, analogously, for deciding if one or more physical storage volumes of another pool should be assigned to a particular logical volume) in order to optimize disk usage with regards to price/performance, said systems are often not able to correctly determine if the data of a particular storage unit is of relevance for the user and/or operator of the data management system. The information used as input for said decision is often insufficient for determining the real relevance and future usage frequency of a particular piece of data. Thus, information gathered by prior art systems is often unsuited for determining the optimum destination storage unit for storing data in a given use case scenario. For example, data causing very high performance load (such as video stream data) may in fact be very unimportant for a given industrial control workflow and allocating such kind of data to highly efficient, expensive storage devices may actually be counterproductive.