1.1. Field of the Invention
The present invention relates to the field of electronic computing, and in particular to electronic storage management. More particularly, it relates to a method and respective system for migrating electronic data in a networked environment from a source storage to a target storage, wherein the migration of data is done in order to keep the source storage running below a predetermined exhaustion threshold.
1.2. Description and Disadvantages of Prior Art
Such prior art Hierarchical Storage Management (HSM) manages n-tier (above: n=2) storage hierarchies by migrating data within the hierarchy. Therefore, single data objects (files) are selected based on an eligibility criterion. Data migrations are triggered if a threshold of used capacity is reached. If the trigger event occurs a migration task migrates files as long as the used capacity within a file system is above a low threshold. When this threshold is reached no more files are being migrated. As the HSM system migrates the data without user or administrator interaction it is called automigration. To reduce the complexity of the setup a 2-tier storage hierarchy is used as an example. Nevertheless, the same scheme can be applied in multi-tier environments with migration of data from tier to tier in the hierarchy. The first tier is called “online storage” which is a locally attached or shared hard disk managed as a file system. The second tier is called nearline storage.
Typically a cheap disk or tape storage is being used as capacity in this tier.
An exemplary prior art storage management system is an IBM product called IBM Tivoli Storage Manager (TSM) for Space Management (TSM HSM). This prior art product implements the concept of so-called Hierarchical Space Management (HSM) as part of the TSM product family. FIG. 1 illustrates the basic architectural components of a prior art file server 10 which implements HSM in form of auto migration processes. The file server 10 comprises a file system 14 acting as online storage by means of some plurality of attached hard discs, a capacity monitoring unit 16 having the respective functional interface to the file system 14 in order to be able to monitor the occupied capacity of the file system 14, a control unit 18 controlling the auto migration processes triggered by a control signal generated in a capacity monitoring unit 16, and a near line storage 22 connected to the file server.
An input data stream 12 labeled “new data” enters into the file server 10, is stored within the file system 14, and a data output stream 20 is defined for data which is migrated from the file system 14 to the near line storage system 22.
TSM is using a client-server oriented architecture. The TSM server manages all nearline storage devices like disk areas or tape libraries where data gets stored for backup, archive, or HSM purposes. While the server is one central instance, multiple clients send data to the TSM server. Various types of clients implement backup, archival, or HSM for data like files or databases. TSM HSM manages local file systems with direct attached storage devices (DASD) or SAN-attached storage devices and their capacity by migrating file contents to the TSM server so that used storage can be released for these files. A placeholder called stub remains in the file system pointing to the data stored on Nearline storage.
The prior art automigration process gets started periodically when more storage capacity is being used.
FIG. 2 shows a sequence of automigration processes carried out due to new data put into along input data stream 12 into the storage system. A high threshold TH is set to 90% while a low threshold TL is set to 80%. In this example 10% of the online storage capacity COnline is being migrated. This amount of capacity CDelta=(TH−TL)*COnline varies only on a very limited level depending on the size of the files being migrated at that point in time when the low threshold is reached. Depending on the distribution of file sizes the number of data objects being migrated may vary significantly. FIG. 2 shows that continuous data growth leads to periodic automigrations. A fixed amount of storage capacity is migrated to the next storage tier.
FIG. 3 shows a file system distribution often found on file servers. A typical distribution of file sizes can vary in a wide range. One typical distribution being found on file servers is a logarithmic dependency between file size and number of files of the same size. Such file servers contain only a small number of large files while the majority of files are small ones. Only a small amount of files is required for being migrated to reach the low threshold. In this case only a small number of large files will be migrated. In another scenario where also smaller files are marked by the migration policy as being eligible for migration and get selected the number of objects can become significantly higher compared to the first scenario.
The prior art implementation of the automigration process of the prior art IBM Tivoli Storage Manager for Space Management (TSM/HSM) configures the I/O bandwidth being used between online storage 14 and Nearline storage 22 statically, i.e. independently of current file system workload. The number of parallel processes carrying out data migrations from storage 14 is defined by a parameter MAXMIGRATORS. Each migration process opens its own session to the TSM server, which manages the Nearline storage. A session is used for transferring data in one stream so only one file per session is migrated. By having multiple migrations processes and the equivalent amount of open sessions, the same number of files are migrated in parallel. Only if the data should be written directly on tape and not enough tape drives are available less files are migrated in parallel. If the sessions have to share I/O path resources the throughput for a single file migration will be reduced. Prior art automigration uses all available resources assigned to it. So during an automigration all resources are occupied while in the time between two automigrations no data is being transferred. This can disadvantageously lead to performance degradation of the system if the value of MAXMIGRATORS is chosen too high. If all the data migrations carried out in parallel have to share the same I/O path the available bandwidth for a single session becomes less. Tape drives, especially using Linear Tape Open (LTO) as a media type technology, require a minimum data rate. If too less data is sent to the tape drive it has to stop writing and buffer data until the next chunk can be written to tape. If the tape stops it has to rewind the tape media to synchronize continued tape writes with the format written in the previous operation. This behaviour leads disadvantageously to significant write performance degradation if the throughput decreases under a certain threshold. This non-linear behaviour between I/O throughput and write performance is specific for tape drives.
Prior art automigration itself is driven by thresholds defined beforehand based on the capacity of the online storage 14. A high threshold TH defines the trigger for starting automigration while a low threshold TL is the trigger to stop automigration.
This concept of static thresholds disadvantageously does not allow changing these settings based on the current status of the storage system. So, independently of a low or high volume of new data stored within the storage system the automigration uses the same I/O bandwidth to the nearline storage 22.
On the other hand, it is basically very difficult, to setup a migration control method based on more than the consumed capacity of storage system 14, as the influence of each input variable needs to be evaluated in the overall context. This, however, is difficult to model fine enough due to the complexity of storage solutions. For example, a harddisk drive is a complex rotary system, having very complex system properties and time behaviour. Also a bus connecting between CPU or memory and hard disk storage is very difficult to model, as the bus load is quite volatile over time.
1.3. Objectives of the Invention
The objective of the present invention is to provide an improved migration control method and system.