A conventional file system uses a block storage paradigm. The “blocks” into which a conventional file system stores data may ultimately be physical locations on physical storage devices (e.g., hard disk drives). However, at least one level of indirection or abstraction typically exists between the physical storage devices and logical blocks processed by a conventional file system. For example, a conventional disk drive may be presented as a “logical unit” (LUN) and a conventional file system may organize collections of LUNs into block storage. These types of conventional systems may use homogenous groups of LUNs (e.g., homogenous groups of storage devices) to provide the blocks for the file system storage. These types of conventional systems typically do not use a heterogeneous mix of LUNs (e.g., heterogeneous mix of devices) to provide the blocks for the file system storage. A LUN is a reference to specific storage in a storage array network. A LUN can represent a drive (e.g., hard disk drive (HDD)), a portion of a drive, or a collection of drives.
While a file system may use a homogenous group of LUNs to provide the blocks for its block storage paradigm, input and output to and from the file system may not be homogenous. For example, some file system input/output (i/o) may involve large sequential transfers while other file system i/o may involve small random transfers. Some files or types of files may consistently produce a certain type of transfer. For example, movie files may consistently produce large sequential transfers while transaction processing database files may consistently produce small random transfers. Some directories in a file system may consistently produce a certain type of transfer. For example, a directory associated with a virtual tape may consistently experience large sequential transfers while a directory associated with a text-based social media application may consistently experience small random transfers. File systems or other applications may be able to track and understand relationships between files, file types, directories, and the types of transfers they experience.
The relationships between transfers and locations can be used to define affinities that facilitate steering data to selected LUNs available to a file system. For example, data that typically experiences a first type of transfer (e.g., large, sequential) may be steered toward a first group of LUNs while data that typically experiences a second type of transfer (e.g., small, random) may be steered toward a second group of LUNs. The first group of LUNs may have properties that make them more amenable to, efficient, practical, or affordable for one type of i/o while the second group of LUNs may have different properties that make them more appropriate for a second type of i/o. Using affinities to steer files to appropriate LUNs may improve the performance of the file system.
Even though conventional file systems may have steered data (e.g., files, directories) to LUNs with which the data had an affinity, opportunities for further improvements and efficiencies may have been missed. For example, even though LUNs have been defined and employed, the LUNs typically are all traditional disk-based LUNs. Even when an underlying device is not a simple single disk drive, the device(s) may be still be made to look like a consistent logical LUN. For example, a RAID set (e.g., RAID-1 RAID-6) may be presented to the file system as a logical block-based LUN.
Data stored by a file system may have a life cycle. When data is new and fresh it may be written or read more frequently. When data is old and stale, it may be written and read less frequently, Some data may only be written once and then rarely, if ever read. Some data may only be written once but may be regularly read. Thus, as data ages in a system, the i/o patterns associated with that data may change. Regardless of the i/o patterns associated with data, that data still consumes resources. For example, the data may use storage capacity in its original place, and may use additional storage capacity and consume processor cycles when backed up, Some data that is rarely if ever accessed may end up consuming space on primary storage devices used by a file system and on secondary storage used by backup devices.
Some conventional attempts to address the inefficiencies of having unused or rarely used data consuming space and processor cycles involve archiving certain data. Data may be archived based on factors including, for example, the size of the data, the age of the data, when the data was last accessed, or other criteria. Conventional archive processes may involve identifying data that ought to be archived and then moving that data to a separate, dedicated archival storage system. The separate, dedicated archival storage system typically is not managed by the original file system from which the data is archived, but rather is managed by its own separate file system. While having a separate file system provides some efficiencies, if the archived data is actually needed, having two separate file systems may introduce significant inefficiencies. Additionally, the separate file system associated with the dedicated archival storage system may also need to be backed up, consuming even more space and even more processor cycles.