This invention pertains to the field of computer data storage, and in particular to file systems for efficiently maintaining data files stored in a random access memory. It further pertains to methods of automatic optimization in a computer file system.
Sector Size
Heretofore, the trend in computer systems data storage has been to compact data on physical magnetic media in order to enable larger storage volumes. One example of this trend is the growth of sector size. The first magnetic media disk drives usually had 128 bytes per sector. Today it is common to find 512 or 1024 bytes per sector. Recent technological advances allow 4096 bytes per sector. Research is under way that will further increase the size of a sector.
File system mechanisms
Heretofore, computer operating systems have used a mechanism by which one or more disk sectors are addressed as a single cluster or block of disk sectors. Typically, four or more sectors are addressed as a single cluster or block. For example, MS-DOS will usually combine four contiguous sectors of a hard disk with a 512 byte sector size into a single 2048 byte cluster. Under UNIX a cluster is called a disk block and a common size for a single disk block is 4096 bytes. In this application, I define a cluster as any one or more associated disk sectors.
When a file is stored, is occupies one or more clusters. On average, the last cluster used will be only half full. The remainder of the cluster is unused and is not available for storage. The unused portion of the cluster is wasted storage space.
A large cluster size is usually selected to enhance system disk access performance. The larger the block of data that can be extracted or written during a given operation, the fewer operations are required to perform a given task. The majority of time delays in a disk system is associated with the protocols involved in the operation performed. Increasing the size of a transfer operation reduces the number of operations that need be accomplished to transfer a file of a given size, and therefore the time required.
An operating system defines a cluster size and then maps the sectors of a disk into the clusters. The definition of cluster size allows the operating system to address disks of differing sector size and provides a universal mapping mechanism. For example, a 2048 byte cluster may have four 512 byte sectors or two 1024 byte sectors depending on the sector size of the disk. In the ideal situation, a cluster will map to a single sector so no sectors are left unused. For example, a 4096 byte disk cluster (block) under UNIX could map one-to-one with a disk that has a 4096 byte sector.
In an idealized file structure in which a cluster maps to a single sector, it can be demonstrated that the larger the sector size, the more unused physical media, mechanical or electronic. When a data file is stored, the last sector of the file is usually only partially filled. In fact, it is extremely rare for the last sector to be completely full. The file length must be an exact multiple of the sector size if there is to be no wasted storage space.
Heretofore, disk oriented file systems have required the central processing unit to perform multiple table look-ups, address computation and data transfers. This need is mandated by the fixed length of the storage media compounded by the mechanical characteristics of the disk drive. When a file operation is initiated, the central processing unit must obtain the absolute address of the file by accessing a look up table which may reside in main memory or on the disk itself. After obtaining a block address, the central processing unit must issue a command for the block of disk space to be accessed. This action must be repeated for each block of disk space regardless of sequencing. The operations are performed in a loop that repeats itself until the operation is completed.
Heretofore, semiconductor data storage media has been accessed in a manner that emulates a spinning media disk drive. File systems were designed to minimize the impact of the mechanical limitations of spinning media and to address the media in an economical manner. The paradigm used to address spinning media has been applied to semiconductor storage. To this end, recent memory product announcements have been made that provide block erase capability in spinning media sector size blocks. A reevaluation of the paradigm in light of the capabilities of semiconductor media shows that the paradigm is no longer valid.
Heretofore, flash memory storage file systems, such as Microsoft's Flash File System, have address the issue of wasted storage space by implementing a singly linked list wherein a status byte signifies whether an entry is a file or a directory. When a file is requested, the list is searched from the head until the file is found or the end of the list is reached. For directory entries, child and sibling pointers are maintained. Entries are added to the list sequentially. When a file is deleted by the user, the entry for the file cannot be deleted as the linked list would be broken. The file system cannot maintain a "current position" for the user. File data is interleaved with the directory information and is thus not centrally located nor easily accessed. Finally, any and all modifications of the file system structure require that a copy of the file system, including data, must be made, the modifications made to the copy and the new file structure and data re-stored in flash ram. Sometimes this copy is made and operated upon on floppy disk, a very slow interface.
U.S. Pat. No. 4,791,564 to Takai discloses a random access memory file apparatus for a personal computer with an external memory file. This patent discloses providing an external memory file comprising random access memory, in addition to the user system RAM. Files are transferred from a floppy disk into the external memory file for use by the CPU in a random access fashion. After modification in the memory file RAM, the files are written back into the floppy disk in the usual serial access format. The external memory file RAM also is used to provide a printer spooler. In the memory file RAM, the first sector comprises an address map. Subsequent sectors may contain essentially data only, on contradistinction to the conventional floppy disk formatting in which directory information such as file names and attributes are interspersed at the beginning of the corresponding data.
U.S. Pat. No. 4,792,896 to Maclean et al discloses a storage controller emulator which is used to provide transparent resource sharing in a computer system. This patent teaches a microprocessor controlled mass storage controller which acts as an interface for mass storage device shared by a plurality of otherwise stand-alone microcomputer systems. The mass storage controller provides a common interface to each of the microcomputer systems which emulates the standard interface expected with respect to an internal floppy disk drive for example. This allows multiple users to share data without modifying their microcomputer system software or hardware. It further allows for standard single user software to be used without modification in a shared resource network environment. Maclean et al disclose hardware to accomplish the described functionality, but do not address the file system itself.
U.S. Pat. No. 4,896,262 to Wayama et al discloses a system for providing semiconductor mass memory in a computer system. It discloses emulation hardware for converting data and address information between a magnetic disk memory format and a semiconductor memory format, so the data may be transferred between a computer and a semiconductor storage device at relatively high data transfer rates without modifying the computer system which is configured in a magnetic disk access mode. The disclosure also provides for synchronization of data transfer, error correction, and multiport access for use in a multi-computer system previously configured in a magnetic disk access mode. In the same vein, U.S. Pat. No. 4,456,971 to Fukuda et al is directed to a semiconductor RAM that is accessible in magnetic disk storage format. These patents essentially are directed to hardware for converting address data and translating command signals for emulating a magnetic disk memory.
Another recent patent directed to solid state emulation of rotating media memory devices is U.S. Pat. No. 4,958,315 to Balch. In this system, emulation is accomplished by multiplexing an offset memory address during each bit time. Nonvolatile memory arrays translate the memory address to an offset address that is proportional to the odd modular track length of the multiple or single head track. An emulation address controller is used to generate timing and initial addresses. In the Fukuda et al patent discussed above, an address synthesizer is coupled to track and sector address registers and also to a counter for synthesizing a RAM address signal.
The need remains for a more efficient file system to speed file transfer operations and for optimal use of memory space. The need also remains for automatically reorganizing a file system to recapture newly available memory space, and for automatically reassembling fragments in the file system.