The use of secondary storage systems to provide for online storage for a computer processing systems that is separate from the primary or main memory of the computer processing system is well known. Examples of current secondary storage systems include magnetic disk drives, optical disk drives, magnetic tape drives, solid state disk drives and bubble memories. Typically, secondary storage systems have much larger memory capacities than the main memory of a computer processing system; however, the access to data stored on most secondary storage systems is sequential, not random, and the access rates for secondary storage systems can be significantly slower than the access rate for main memory. As a result, individual bytes of data or characters of information are usually stored in a secondary storage system as part of a larger collective group of data known as a file.
Generally, files are stored in accordance with one or more predefined file structures that dictate exactly how the information in the file will be stored and accessed in the secondary storage system. In most computer processing systems, the operating system program will have a file control program that includes a group of standard routines to perform certain common functions with respect to readings, writing, updating and maintaining the files as they are stored on the secondary storage system in accordance with the predefined file structure. As used within the present invention, the term file system will refer collecting to the file structure and the file control program.
In the file systems of current operating system programs, the predefined file structure typically includes a control portion and a data portion that is associated with that control portion. The data portion of the file structure is split into one or more segments of logical blocks of data, with each logical block of data being allocated a single fixed memory size, such as 1K, 4K or 8K bytes. The division of the data portion into one or more logical blocks is done in order to facilitate access to different parts of a large file, and at the same time standardize the way in which data is handled by the standard routines of the file system. By analogy, the logical blocks of a data file can be thought of as individual pages of a book, each of which can be turned to individually, as opposed to having one long, continuous scroll of paper, where the information on the end of the scroll could be read only by unrolling the whole scroll. In addition, by dividing the file into two or more logical blocks of data, the file system allows for a file to be located in two or more smaller and separate physical locations in the secondary storage system, rather than requiring one large contiguous physical location to store the file.
In the standard file system of the Unix.RTM. System V, Release 3 operating system program, for example, the logical block size is 1K bytes. The control portion of the predefined file structure for the Unix.RTM. System V operating system program, known as the inode, contains a table of contents to locate the data for a file. In this file system, each block in the secondary storage system is addressable by a number, and the table of contents portion of the inode consists of a set of block numbers pointing to the blocks of data as they are stored in the secondary storage system. The standard routines of the file system access the desired data by locating that data within one or more of the blocks of data listed in the inode, and then performing their various functions by operating on a single block of data at a time. For a more detailed description of the file system of the Unix.RTM. System V operating system program, reference is made to Bach, M., The Design of the Unix.TM. Operating System, Chpts. 4-5 (1986) Prentice-Hall, pgs. 60-145.
One of the problems in using a small logical block size, such as the logical block size of 1K bytes for the standard Unix.RTM. file system, is that a large number of accesses to the file system are required whenever the desired data spans multiple numbers of blocks. In this situation, multiple repetitions of the various standard functions of the file system will be required in order to satisfy the file request. This results in increased overhead and decreased efficiency, particularly for large files. The increased overhead for large files is justified in this file system by the assumption that the vast majority of files stored in a secondary storage system are less than 10K bytes in size, and therefore the file system should be optimally designed to provide access to these small files.
An alternative to the use of a small logical block size is to increase the size of the logical block. For example, in the Berkeley 4.2 BSD file system, the system administrator can configure the operating system program to use a single, larger logical block size that stores all of the files in the file system in logical blocks that are each 4K bytes, or that stores all of the files in the file system in logical blocks that are each 8K bytes. The advantage of using a larger block size is that the file system can provide faster access, particularly for larger files. The disadvantage is that by having a larger block size, the file system increases block fragmentation, leaving large portions of the secondary storage system unused, especially when the large number of small files are to the stored in the secondary storage system. For instance, if the logical block size is 8K bytes, then a file of size 12K bytes use 1 complete logical block and half of a second logical block. It will be seen that if the average size of files in the file system is uniformly distributed, then the average wasted space will be a half a block per file, and the amount of wasted secondary storage space for the entire file system can be significant.
In the Berkeley BSD 4.2 file system, an attempt is made to reduce the impact of block fragmentation by allocating a special logical block to contain the ending block fragments of two or more flies. Thus, one logical block can contain the data for the ending block fragments of two or more different files. The problem with this approach is that the allocation and deallocation of the special logical blocks for block fragments can become complicated when one or more of the files having block fragments in the special logical block are altered or deleted from the system. This problem is aggravated when a large number of small files are to be stored in the secondary storage system.
The current file systems present the system administrator with a choice between designing the secondary storage system to operate more efficiently for the larger number of small files, or to operate more efficiently for a smaller number of large files. Although the current file systems have storage allocation systems and methods that are adequate for storing and accessing files on secondary storage systems, it would be desirable to provide a storage allocation system and method that could effectively and efficiently address the storage allocation requirements of both small and large files in the same file system.