Conventionally, inodes and files had a one-to-one relationship. A single inode was associated with a single file and a single file was associated with a single inode. This one-to-one relationship produced efficiency issues with small files because the metadata and inode allocated for a small file may have been excessive relative to the file size. In extreme cases, the inode used to find a small file may have been larger than the small file itself. This was inefficient because more storage space was being used to locate a small file than the file itself consumed. Additionally, when there were a large number of small files, the number of inodes associated with those small files may have grown to a size where searching the inode table become inefficient, particularly if the inode table could not be stored in a single unit (e.g., page) of memory.
On the other end of the spectrum, the metadata and inode for a large file may have been insufficient for efficiently locating various parts of the large file. The single inode may have only had enough room to locate the beginning of a file. If a user wanted to find the middle of the file or some other part of the file, then complicated and inefficient pointer following processing may have been required. If the file was spread over a number of locations or a number of devices, finding all the portions of the file may have required acquiring address information from multiple locations or devices.
These problems with conventional inodes were exacerbated by the fact that conventionally, inode numbers were assigned sequentially and encoded no information. Since the inode numbers encoded no information, the inode numbers may have been reassigned when they became available. For example, when a file for which an inode stored information was deleted, the inode number may have been reused. Conventionally, these inodes associated with reassignable inode numbers were stored sequentially in an inode table. The inode table may have been stored on a single device. This model for one-to-one reassignable inode numbers is sub-optimal for emerging data storage systems that store enormous amounts of data across multiple storage systems using a wide range of file sizes. For example, data may be archived in the cloud. An archive application may handle a slow trickle of data over a period of time and stores what may be the only copy of that data while a backup application may handle a large dump of data all at once and may store a second or third copy of data. Thus, an archive may place different demands on a file system and its inodes than a backup application or live file system.
A cloud archive may be built using, for example, a file system. The file system may be a shared disk file system (e.g. StorNext® by Quantum). The file system may include a plurality of different storage devices. Archive applications that use file systems face a number of challenges. One problem concerns handling a wide range of file sizes. For example, it may be difficult or inefficient to handle files ranging from as small as 1 KiB in size all the way up to 1 TiB in size in the same archive using conventional inodes because, when it comes to file sizes and inodes, one size does not fit all.
Metadata and an inode for a small file may be the same size as metadata and an inode for a large file, which may produce inefficiencies at both ends of the file size spectrum. For example, the metadata and inode for a small file may be excessive and the metadata and inode for a large file may be insufficient.
A file system may store data and metadata for billions of files. Conventional system designers may have never imagined an archive system or a file system handling even a million files. Thus, contemporary file systems that are three or four orders of magnitude larger than conventional file systems challenge conventional approaches to certain file system activities, particularly inode table size and relationships between inodes and files of different sizes. Conventional algorithms, approaches, and apparatus for interacting with file systems have suffered from performance degradation as file systems grow ever larger. The degradation is due, at least in part, to the one-to-one relationship between files and inodes and the form of conventional inodes.
A file system and a storage manager may, provide a file looking storage area network (SAN) that gives access to files using a single namespace even though the files in the file system served by the namespace are spread across multiple devices. The functionalities (e.g., data mover, policy engine) associated with the storage manager and the file system need to be able to find and use metadata associated with the file system, need to be able to find and use files associated with the file system, and need to be able to perform other actions. These actions need to be performed efficiently and completed within a reasonable amount of time. Conventional inodes compromise the ability to perform these actions efficiently in suitable periods of time.