Mass storage systems are generally used for managing, storing, and retrieving a large number of files, which are typically organized in one file system. A mass storage system generally comprises a hierarchical storage management (HSM) system and high speed and slower storage devices on which the files are stored physically. An application that requests the storage or the retrieval of a file from the file system therefore does not have to know the location where the file is stored. The hierarchical storage management system migrates files that fulfill a certain criterion, for example files that are older than 100 days, from the high speed storage devices such as hard disc devices to slower storage devices such as tape drives. If a file has been migrated to the slower storage device and a user wants to access the file it is copied to the high speed storage device and then made available to the user. Thus, it takes longer to access files that are stored on the slower storage devices. Hence a hierarchical storage management system should arrange the files that are on the high speed storage devices and on the slower storage device in an intelligent way so that files that are often requested by a user are kept on the high speed storage device.
It would in principle be ideal to store all files on high speed storage devices all the time. However, high speed storage devices are generally more costly than slower storage devices so that through the utilization of slower storage devices the total cost of a mass storage system can be reduced.
If large numbers of files have to be managed by the hierarchical storage management system problems arise with respect to the selection of the appropriate files for migration. A threshold based auto-migration might start migrating files if a high threshold of the storage usage of the high speed storage device, which can for example be a tier 1 storage device, is reached. Typically eligible files are determined up front. If the number of files is very large, for example larger than 108 files, a query on all files that has to be performed in order to determine the files for migration requires a significant amount of time. Furthermore, the most eligible files are hard to find as all files stored on the high speed memory device need to be scanned through first for determining the criteria for more and less eligible files. A second query is required to search for files based on the criteria of the first ones. Thus it is hard to determine criteria for eligible candidates in a timely manner. Eligible candidates for migration might for example be files that are relatively old or relatively large, while candidates that should be left on the fast speed storage device are files that are relatively young and small.
The hierarchical storage management system of the IBM Tivoli Storage Manager (TSM) system uses for example a candidate list which contains a subset of the set of all files contained in a file system. The subset is optimized continuously by iterating through the file system. As the candidate list contains a maximum number of entries files not contained in the list cannot be identified as candidates. Hence the candidate list contains only a limited number of eligible files. Whenever new eligible candidates are found other files have to be moved out of the candidate list. This results in significant CPU usage and input/output accesses of the file system if 108 to 109 files need to be managed by the hierarchical storage management system.
There is therefore a need for an improved method and data processing system for managing a mass storage system.