Data compression is well known in the computer industry. Data compression refers to various means of reducing the storage requirements needed to store information. File systems have used data compression to increase the effective storage capacity of storage devices (e.g., drives) by storing compressed data on a compressed drive. File systems decompress the compressed data before providing the data to a calling program. When a calling program wishes to store data on a compressed drive, the calling program invokes the compressed file system (usually through an operating system), which compresses the data and stores the compressed data on the compressed drive.
File systems typically implement a compressed drive as a logical drive within an uncompressed drive. FIG. 1 shows a sample prior system for using an uncompressed drive as a compressed drive. The uncompressed drive 100 has four uncompressed files 110, 112, 114, 116 and a compressed logical drive 102. The compressed logical drive 102 contains three compressed files, 104, 106, 108. The compressed logical drive 102 appears to the uncompressed drive 100 as merely another file. However, the compressed file system treats the compressed logical drive 102 as if the compressed logical drive 102 were a physically separate drive. Therefore, a compressed logical drive 102 is a portion of an uncompressed drive 100 that is used in a different manner.
FIG. 2 depicts a typical layout for a compressed logical drive. The compressed logical drive 102 has a Basic Input/Output System Parameter Block (BPB) 202, a Bit File Allocation Table (BitFAT) 204, a Compressed File Allocation Table (CFAT) 206, a File Allocation Table (FAT) 208, a root directory 210, and a sector heap 212. The BPB 202, contains the length of the compressed logical drive 102 as well as general information about the drive, such as the number of sectors on the drive, the number of sectors per cluster, and the maximum number of entries in the root directory. The BitFAT 204 contains a bitmap which indicates whether each individual sector in the sector heap 212 is available or in use. The CFAT 206 is a table that maps uncompressed data (in the form of clusters) onto compressed data in the sector heap 212. A cluster is the unit of allocation for a file system and is defined as a multiple (usually a power of two) of sectors. A sector is a physical portion of the drive that is accessed as a unit (e.g., 512 bytes). The FAT 208 is a table that contains an entry for each cluster of uncompressed data and links together the clusters that are allocated to a file or a directory. Typically, file systems are organized in a hierarchical fashion with directories and files. Directories can contain both files and other directories and one directory at the top of the hierarchy is known as the root directory. The root directory 210 contains the file names and subdirectory names of all the files and subdirectories in the root directory of the compressed logical drive 102. The sector heap 212 contains the sectors in the compressed logical drive 102 where the compressed data is stored.
FIG. 3 is a block diagram of the components of a compressed file system. The compressed file system contains a compressed logical drive 102 and a memory 302. The compressed logical drive 102 contains a root directory 210, a FAT 208, a CFAT 206, and a sector heap 212. The root directory 210 contains an entry 304 for each file or directory in the root directory of the compressed file system. The FAT 208 contains an entry for each cluster of the compressed logical drive 102. Each entry in the FAT 208 refers to the next cluster in a chain of clusters. Each chain of clusters represents a file or a directory on the compressed logical drive 102. For each entry in the FAT 208, there is a corresponding entry in the CFAT 206. The CFAT 206 maps a cluster from the FAT 208 onto the actual sectors in the sector heap 212 that contain the compressed data for that cluster. In addition, the CFAT 206 maintains a count of the number of sectors in the sector heap 212 that are used for storing each cluster. The sector heap 212 contains the actual sectors of the compressed logical drive 102 that contain data. The memory 302 contains a calling program 308, an operating system 310 and a compression/decompression component 306. The calling program 308 can be any computer program wishing to access a file on the compressed logical drive 102. The operating system 310 is a computer program responsible for managing the files on the compressed logical drive 102. The compression/decompression component 306 is responsible for compressing data and decompressing data. The compression/decompression component 306 can be any of a number of well-known compression techniques.
The following illustrates access to the compressed data when a calling program 308 invokes the operating system 310 to read data from the compressed logical drive 102. The calling program 308 passes the file name of the desired file to the operating system 310. The operating system 310 finds the entry for the desired file in the root directory 210. The entry 304 in the root directory 210 contains the file name and the cluster number of the first cluster of data stored in the file ("first data cluster number"). In the root directory entry 304, the data cluster number for the file is cluster 54. After receiving the first data cluster number, the operating system 310 determines the file cluster ordinal. That is, the operating system 310 determines which cluster, in relation to the file (e.g., first, second, third, etc.), contains the requested data. The number of this cluster in relation to the file, is the file cluster ordinal Then, the operating system 310 accesses the FAT 208 with the first data cluster number and the file cluster ordinal and accesses the FAT entries to locate the cluster in which the requested data is contained. Therefore, the number of entries accessed in the FAT 208 is equal to the file cluster ordinal. For example, if the data requested is contained in the second cluster of the file, the file cluster ordinal would be equal to two. In order to access the data in the second cluster, the operating system 310 examines the FAT 208 entry for the first cluster of the file to determine the FAT entry for the second cluster for the file. In this example, entry 54 of the FAT 208 refers the operating system 310 to entry 55 of the FAT 208. The operating system 310 then accesses the corresponding CFAT 206 entry to determine the actual sector or sectors in the sector heap 212 that contain the compressed data. In this example, the CFAT 206 entry 55 refers the operating system 310 to sector 275 of the sector heap 212. However, before the calling program 308 can use the data contained in sector 275 of the sector heap 212, the operating system 310 uncompresses the data using the compression/decompression component 306. Calling programs in a compressed file system typically use data in an uncompressed form.
Prior uncompressed file systems use a FAT and store clusters of data onto the drive as clusters. That is, uncompressed file systems do not store data on a sector-by-sector basis, uncompressed file systems store data on a cluster-by-cluster basis. Instead of developing a completely new compressed file system, the developers of some compressed file systems modified existing uncompressed file systems. The developers modified existing uncompressed file systems so that programs that used the uncompressed file systems would not have to change to take advantage of the newly developed compressed file systems. As such, the developers of one compressed file system kept the structure of the uncompressed file system (i.e., the FAT) and added structures to map data stored on a cluster-by-cluster basis to compressed data stored on a sector-by-sector basis (i.e., the CFAT and BitFAT).
Although compressed file systems increase the effective storage capacity of uncompressed drives, when storing data to a compressed logical drive, compressed file systems incur significant overhead due to the invocation of the compression/decompression component. In addition, the compressed file system, like uncompressed file systems, incurs overhead waiting for the physical storage of the data (i.e., the write operation) onto the compressed logical drive. The performance of compressed file systems has been increased by introducing a memory disk cache into the compressed file system. A memory disk cache is computer memory that is used to store disk data that is frequently accessed to reduce the number of times that the drive must be used in order to either read or write data. A memory disk cache stores data in terms of cache blocks. Cache blocks are defined in terms of the number of clusters of data that one cache block can store. Typically, a cache block can store four clusters of data. Using a memory disk cache (hereafter "memory cache") is preferred over physical drive access because accessing memory is significantly faster than accessing a drive.
FIG. 4 is a block diagram of a compressed file system which uses a memory cache. The memory 302 has a calling program 308, a memory cache 402, an operating system 404, and a compression/decompression component 306. The memory cache 402 is used to store frequently used data in order to reduce the number of times that the compressed logical drive 102 is used. For example, a memory cache may store the data for one or more files, or the data for portions of one or more files. The operating system 404 uses a caching algorithm to determine which data is stored in the memory cache 402, when the data is written out to the compressed logical drive 102, as well as when the data is read from the compressed logical drive 102 to the memory cache 402. Various caching algorithms are well-known in the computer industry. Although using the memory cache 402 reduces the number of times that the compressed logical drive 102 is physically accessed, the data stored in the memory cache is still in a compressed form. Therefore, when the calling program 308 invokes the operating system 310 to store data, the compression/decompression component 306 is used before the data is stored and the write completes. In addition, when the calling program 308 invokes the operating system 310 to read data from the memory cache 402, the operating system 310 decompresses the data by invoking the compression/decompression component 306 before the data is used by the calling program 308.