Various forms of network storage systems are known today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network storage system includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (“clients”). In the context of NAS, a storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical disks or tapes. The mass storage devices may be organized into one or more volumes of a Redundant Array of Inexpensive Disks (RAID). Enterprise-level filers are made by Network Appliance, Inc. of Sunnyvale, Calif.
In a SAN context, the storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain Filers made by Network Appliance of Sunnyvale, Calif.
Conventional file systems include data sets, such as volumes, files (also referred to as containers), or logical data storage units. A file system is a hierarchy of the stored data sets. A file system layer or manager is an application-level programmatic entity or layer which imposes the hierarchal structure on the data sets, such as the files, directories and/or other data containers stored and/or managed by a storage server, and which services read and write requests from clients of the storage server. Conventionally, a logical data container may be another type of a logical storage object, since one data object is stored per container. One type of data storage unit is a logical unit number (LUN). A LUN may be a virtual partition of a RAID group. For example, a LUN may be formed as a “stripe” that is one or more blocks wide, across the storage devices in a RAID group, where a block may be, for example, a 4 Kbyte chunk of storage space. A block is the fundamental unit of storage space that the file system layer can maintain. A LUN may appear to a client, for practical purposes, as a physical storage device such as a disk. For example, a volume may include a file system.
A volume is a logical data set which is an abstraction of physical storage, combining one or more physical storage devices or parts thereof into a single logical storage object, and which is managed as a single administrative unit, such as single file system. A volume may be defined from a larger group of available storage, such as an aggregate. A volume may be logically broken down into logical data sets (storage objects) called “plexes,” which may contain one or more RAID groups. A file system includes directories and files (also referred to as a container or logical unit number). Data is stored in one or more blocks within the container. This data is typically stored as data objects, one data object per container, and the data may fill one or more blocks of the container.
When storing files in a conventional file system, the total disk space consumed is frequently more than what the file itself requires. Conventional file systems are subject to an internal block fragmentation that dictates the size of each block. When the size of the stored data objects is less than the underlying block size of the file system, space in the file system is wasted or unused due to the internal block fragmentation. For example, storing a 3K file on a file system using 4K blocks uses up a minimum of 4K, leaving 1K unused. For storing large numbers of small objects, this is very inefficient use of storage space. In a conventional file system that stores a large number of objects, such as, for example, 100 billion emails of each 2 KB or less, more than half the space of the file system is unused due to the internal block fragmentation (e.g., 4 KB) of the file system, since most of the data objects are smaller than half the size of each block.
The problem may become considerably worse when considering how many inodes are reserved per unit of disk space. Inodes are used in a directory that may be accessed for referencing the data objects. Creating a large number of data objects in a file system can use up all inode resources and put additional stress on the directory lookup performance in referencing data objects. An inode is a metadata structure which is used to store metadata about a file, such as ownership of the file, access permission for the file, size of the file, file type, and pointers used to locate the data blocks for the file. Inodes contain pointers to the top level of indirect blocks for the file, such as for example, in a buffer tree. The inode is stored in a separate inode file. The inode file is a file which contains the inodes of all files (or containers) in a particular volume. Each inode includes a list of block(s) that contain the data object of the file, and where the data is located on the storage devices. Having a filename, a user, or an application can find the corresponding inode, which references where the file is physically located on the storage devices.
Also, it should be noted that each directory is a special kind of file that includes a list of filename(s) and the disk address of the inodes of these files. A directory may include a list of all the files that are within that directory, and each file includes the disk address of the inode of that directory. The inode includes the physical location of the file on the storage server. For example, for each file that is created 100-200 bytes, for example, are used to create an inode for that file. When storing a large number of data objects, additional space may be used to store the inodes for each of the files that include one data object per file. Conventional file systems that support high capacities, such as large files and/or large number of files, typically require a substantial fraction of the storage to be consumed by per-file metadata structures, because the data objects are stored on one-data-object-per-one-file model. For example, if no more inodes are allowed by the file system than one per 32K of space, then storing a 3K file as an individual file effectively requires 32K of on-disk space. However, changing the block size or inode allocations of the underlying file system is frequently not possible.
Conventional file systems typically access a particular object by referencing the particular container (or file) that includes the data object, since each object corresponds to the container in which it is stored. This may be done by using a table or map, which includes mappings between the filename, as known to the outside world, and the container identification, which indicates the physical location of the data on the storage devices.
Another conventional design of databases may use a packing approach to address the storage of small records in a large table; however, these databases are for internal use and are not available to the user for general purpose storage use. A database typically stores multiple records in a single table. The tables is then stored in a single file or volume. For example, in storing a collection of social security numbers, each number occupying nine bytes, it would be very inefficient to store each number in a separate file. Instead, the database typically packs a large number of social security numbers together in a table and stores the result in a single file or volume.