1. Field of Invention
The invention relates to computer filesystems and management of persistent storage in a tiered storage system.
2. Description of Related Art
Storage is typically divided into data blocks that are addressable by the underlying hardware. The blocks are typically all of equal size, but on some devices or some systems they can be of different sizes. Stored content occupies one or more blocks of data that when ordered into a logical sequence forms a file. Filesystems typically organize files into a hierarchy of directories for ease of use. Directories are themselves files created by the filesystems that consist of file descriptor records (sometimes called inodes). Starting at the root directory of a filesystem, it is possible to navigate through the hierarchy of sub-directories to reach a desired file.
The typical storage device is controlled via a device driver and can only understand physical block addresses that correspond to its capacity. Neither the device nor its device driver are aware of the semantics of the block read or written. For example neither is aware of the file context or the logical position within the file of the physical block.
Filesystems are attached (sometimes called mounted) to storage devices that they access through device drivers. A filesystem software module manipulates the filesystem, and performs operations on it at the request of the operating system. A device driver is a software module usually provided by the vendor of the storage device, but sometimes provided in conjunction with an operating system or from some other source. In a simple case the device driver interfaces the filesystem module with a single storage device such as a hard disk drive (HDD), CD/DVD or solid state drive (SSD). In more complex cases the device driver interfaces the filesystem module with a RAID controller or a logical volume manager that controls several storage devices. In all cases the filesystem module communicates with a single device driver that encapsulates one or more storage devices in such a way that they appear to the filesystem module as monolithic storage capacity. In particular the filesystem module can access any block within the available storage capacity but cannot address a particular storage device encapsulated by the device driver. If the filesystem module needs to know the geometry of the storage device abstraction (e.g. the count of platters, tracks per platters, sectors per track, the sector size, etc.), it can discover such information through the device driver API.
Most filesystems manage the physical blocks available on a device through a bitmap where each bit corresponds to a block on the device and serves to indicate whether that physical block is available or allocated. The physical blocks are viewed as a contiguous sequence of uniform capacity and near-uniform performance (in some hardware the disk geometry introduces some performance difference across cylinders due to rotation).
In conventional storage systems storage devices are managed through dedicated filesystem instances. Often, a filesystem instance is identified by its mount point or root directory. For example, in a Microsoft® Windows® operating system, filesystem instances are often identified by a letter followed by a colon and a backslash (e.g. ‘c:\’). In Unix operating systems they are often identified by a slash and a root directory name (e.g. ‘/usr’). It is possible for more than one filesystem to occupy a single hardware storage unit (such as two partitions on a single HDD), or one filesystem to occupy more than one hardware storage unit (such as a multiple disk drives in a RAID array, or multiple networked computer systems in a network filesystem). Separate filesystems are independent of each other, however. They do not cooperate and they require that all their files be entirely contained within one instance. Thus while it is possible to move a file from one filesystem to another (capacity allowing), it is not possible for multiple filesystems to share in the management of the blocks of a single file.
Files are data containers for which filesystem modules manage the allocation, de-allocation and access to data blocks contained in the file. A “file”, as used herein, has a file ID or file identifier which is unique with a filesystem. If the operating system is a Unix®, Linux®, VMS® or Microsoft® Windows®-based operating system, then the term “file” as used herein is co-extensive with the term as used in relation to those operating systems. A “file” can have any number of data blocks, including zero if the file is empty. The filesystem exposes the contents of a file to higher levels of software (e.g. an operating system or an application program) as a sequence of contiguous logical blocks, and the filesystem maps the logical blocks to the corresponding physical blocks as exposed to it by the device driver. The physical blocks allocated to a file are unlikely to be contiguous. In addition, since the device driver can encapsulate a wide variety of physical storage devices, the device driver may perform another mapping of physical blocks to blocks on the actual storage media that are in a sense even more physical. The latter mapping is hidden from the filesystem by the device driver, however, so for the purposes of the present description, the data blocks as exposed to the filesystem by lower level software are referred to herein as “physical” blocks.
Modern applications have a wide variety of storage needs with varying characteristics. For example, a database system might have a huge amount of data records, any one of which is accessed only rarely. The same database system might also have an index file that is much smaller than the stored data, but which is accessed constantly. A wide variety of storage devices are also available, defined by their performance characteristics in terms of input and output per second (IOPS) and throughput (sequential access) in megabytes per second. It is normally the case that the greater the capacity of a storage device the lower its performance and the higher the performance of a storage device the smaller its capacity.
In order to maximize performance at minimum cost, storage systems are available which offer two or more “tiers” of storage, with higher tiers offering better performance but smaller capacity, and lower tiers offering larger capacity but lower performance. Application programs that are aware of the different tiers are often designed to store bulky, infrequently accessed data files on a lower tier and a much smaller set of frequently accessed files on a higher tier. Database systems, for example, are often designed to store data records on lower tiers and index files on higher tiers. Often a database administrator is employed to periodically analyze usage records of various files, and manually promote or demote files among the tiers to enhance performance in response to actual usage patterns. As used herein, two different storage units are considered to occupy different “tiers” only if their positions in the performance vs. capacity tradeoff are significantly different, different enough for an allocation of data among them to reduce overall storage costs enough, or improve overall performance enough, to be worthwhile. Typically the devices in different tiers use different storage technology.
Conventionally, the assignment of data among tiers of a tiered storage system is performed at the granularity of a file. It often happens, however, that for large files, some parts of it are accessed much more frequently than other parts of it. Conventional tiered storage systems do not have the ability to split a file among more than one tier, under control of the filesystem or higher level software.
Nor is it possible for conventional systems to implement sub-file granularity of data assignment among tiers under control of the higher level software. In order to store and retrieve data from a file, application programs specify a command, a filesystem (e.g. ‘C:\’), a particular file according to the semantics defined by the filesystem (e.g. a directory path and file name, or an inode number), an offset within the file and a byte count to read or write. The operating system passes the request to the filesystem module which controls the specified filesystem, which in turn converts the offset into a logical block number that it looks up in the inode of the file to determine the corresponding physical block number. The physical block number is passed to the device driver to execute the I/O operation.
This control flow precludes the promotion and demotion at the granularity of data blocks based upon their usage pattern as none of the components has sufficient information. The program initiating the I/O cannot participate as it lacks any knowledge of internal structures and protocol of the filesystem and storage device. The operating system similarly lacks knowledge of internal structures and protocol of the filesystem and storage device. The filesystem module knows the logical block to physical block mapping and could compute the popularity of blocks as they are accessed; however it only manages blocks on its attached storage device and is unaware of and does not manage other available storage devices. Lastly the storage device driver knows the physical block number and may be aware of other storage devices available, but it does not know the logical block number and cannot manipulate the logical block to physical block mapping.
It would be desirable to promote and demote data among tiers in a tiered storage system at a granularity smaller than a full file, since promoting and demoting at file granularity often can be impractical. First, space must be available for the entire file on the storage device to which the file will be promoted, and this may not be true for faster (and smaller) storage devices. Second, moving an entire file can be time consuming, especially where the file is large. And third, moving an entire file may be overkill as the great bulk of performance gains could be achieved by improving access to only a relatively small part of the file that will be accessed more frequently than other parts. However, until now, sub-file granularity of data assignment among tiers has not been possible.