The present invention relates to file management systems for computer storage subsystems, and more particularly, to a method, system, and storage medium for optimizing disk space and information access.
File systems define the organization of files stored on the peripheral devices associated with a computer system. Accordingly, in order for the computer system to read or write data that can be understood by the computer system and its peripheral devices, the data must be organized consistent with the file system. File systems facilitate communication between the operating system and device dependent drivers and are responsible for converting read and write commands generated by an operating system (as well as functions such as opening and closing files), into a form recognizable by the device driver.
One well-known file system is the file allocation table (FAT) file system. The FAT points to clusters of sectors on a disk that hold a file's data. A FAT entry can be 12, 16, 20, 24, or 32 bits long, depending on how the disk's sectors are organized into clusters, and how large the drive is. Unfortunately, as drives grow bigger, so does the size of the FAT. One way to prevent the FAT from becoming too large is to group more sectors into each cluster to allow for fewer entries in the FAT. For example, an 8 GB drive whose cluster size is 16 sectors, would need 1,048,576 entries in its FAT (i.e., 8GB/(16*512), where a sector has 512 bytes). Utilizing an algorithm to squeeze 1,048,576 entries into a 20-bit FAT would mean the FAT would have to be 2,621,440 bytes long. If a 24-bit FAT was to be used for the 8 GB drive it would require 3,145,728 bytes. Each cluster is accessed by the preceding cluster entry for the file. In other words, if a file is 1 MB long, it would need to point to 128 entries (of 16 sectors) in the table, which would require 192 bytes for a 12-bit FAT, or 256 bytes for a 16-bit FAT.
One disadvantage in increasing the size of the clusters is the potential for wasted space. For example, consider a cluster size of 16 sectors. If a file needs only 4,000 bytes, then 4,192 bytes would be wasted in that cluster (16 sectors=8,192 bytes−4,000 bytes used=4,192 remaining bytes).
Another type of file system utilizes run length encoding. This system provides bit maps for marking free/used sectors in which a set number of sectors is assigned to each bit throughout the whole map. Each bit typically represents one sector, which means a map built for a large capacity disk would need to be correspondingly large. If, on the other hand, the bits represent more than one sector, there is again the potential for wasted space. The run length encoding has set 4-byte ‘start’ and ‘tail’ indicators. The ‘start’ indicator refers to the number of the starting sector and the ‘tail’ indicator refers to the number of the end sector. If contiguous sectors can be found and a file has to be appended, the needed sectors are added to the tail. If they are not contiguous, a new 8-byte start/tail combination has to be written. While the latter can save space by accessing fewer fixed number sectors (usually 1), and appending sectors as needed, 8 bytes are still required in a run. In addition, the structures needed to manage these runs require one sector per file. If the runs are too large to fit into the first sector, an entire new sector must be saved to store additional information. These structures, although capable of growing and shrinking, still have a set of assigned characteristics to them. Files accessing this file system, whether large or small, are all dealt with in the same manner, i.e., the file has no control over its sector allocation.
What is needed is a way to optimize disk space and information access to accommodate variable sized files and storage subsystems.