In modern computer systems, a file system stores and organizes computer files to enable a user to efficiently locate and access requested files. File systems can utilize a storage device such as a hard disk drive to provide local access or provide access to data stored on a remote file server. A file system can also be characterized as a set of abstract data types that are implemented for the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data. The file system software is responsible for organizing files and directories.
Many companies and individuals with large amounts of stored data employ a file system as a data storage system. These data storage systems can be located local to the data to be backed up or at a remote site. The data storage systems can be managed by the entity controlling the primary data storage devices or a data storage service company. Data can be added to the storage system at any frequency and at any amount.
Data in a data storage system can be arranged hierarchically in the storage system, which is particularly necessary when the amount of data exceeds the available main memory. Consequently, auxiliary memory can be employed to accommodate large amounts of data in a data storage system. Auxiliary memory is not accessible by a computer's central processing unit (CPU), but can be read into CPU main memory in portions so that the data can be manipulated. Auxiliary memory can extend to storage that must be mounted (either automatically or manually) to be read into a CPU's main memory.
File systems can be built on top of block-based storage and allocate storage for user data and file system metadata in units of file-system blocks. A file-system block corresponds to an integral number of block-storage blocks. For example, a file-system block can be four kilobytes while a block-storage block can be 512 bytes.
Block-based storage is widely used for primary storage. An efficient method of backing up primary storage is to detect and backup only changed blocks. The changed blocks can be applied to an earlier full backup to create a new, complete backup reflecting up-to-date changes. Advantageously, less data is transferred to create the primary backup since only a fraction of the block storage system has likely been modified since the last backup. Synthetic datasets that closely parallel the changed blocks of real-world datasets can be used to test a block-based storage system and therefore improve its design and implementation.