1. Field of the Invention
The invention relates to a method and system of compacting sparse directories in a file system. In a specific implementation, the invention relates to such a method and system of compacting sparse directories in a file system, in particular, in a network attached storage (NAS) device.
2. Description of the Background
In the computer industry, storage technology has evolved rapidly over the past number of years, and storage capacity has increased dramatically as the need for managing, storing and accessing large amounts of data increases with various organizations' needs.
Traditionally, such data has been managed and accessed through the creation of a file system. One of the first traditional file systems was a hierarchical structure made up of a tree of directories including a root directory and subdirectories underneath it. More specifically, a directory is a recursive structure that contains entries. Each entry is a file. A file may be a special file, called a directory file, or it may be a data file. The contents of a directory file are generated by the file system, and users generate the contents of a data file. In the remainder of this document, the term file represents both data files and directory files interchangeably.
When a file system contains a large number of files, a mechanism is required to divide the set of all files into subsets of related files. That grouping of files helps a user to navigate through what is potentially a very large collection of files. As already discussed, one of the first most popular groupings employed by file systems is the hierarchical directory structure, with the topmost node in the tree called the root directory.
Such directories organize its information through a collection of records known as directory entries, each of which represents a single file or another directory. A single directory entry contains an I-node number, entry allocation size, filename size, the filename, and padding. The I-node number is a unique file identifier. The allocation size is the space consumed by the file name plus padding. This information allows a user to compute the size, in bytes, of the directory entry. The file name length corresponds to the allocation size minus the padding size, or in other words, the actual bytes consumed by the name of the file.
Early implementations of directory files organized the directory entries as a sequential list of records. In order to find a specific directory entry, a user had to scan the list sequentially. In such systems, once directories grew to more than a few hundred files, the list concept could no longer work because of the excessive time needed to find a particular filename.
A more recent implementation for large directories, maintains files in a sequence of hash tables. A hash table is a popular technique for fast search, insert and delete operations on a large collection of records. It is a table of linked lists and has a fixed number of “buckets,” each of which is the start of a single link list. Each record in the collection provides a key that will be mapped into one, and only one, of the buckets. The value of that key is referred to as the record's hash value. Thus, when searching for a particular record, the file sytem only has to inspect a single list corresponding to the record's hash value, thereby significantly cutting down on the magnitude of the search space.
Such an implementation allows for fast insert, delete and look-up of files. However, inserting a large number of files will cause the size of the directory to grow so that, if a large number of files are subsequently deleted, large regions of the directory will become empty, and it becomes time consuming to find a file because many empty regions have to be inspected during the search.
In accordance with the invention described herein, there is provided a method and system which solves the problem of the prior art, in particular when a hashing scheme is used in implementing a directory, when it is desired to compact the directory due to a large number of files having been deleted.