1. Introduction
Today, as in the past, computers almost never have enough self-contained read/write (R/W) memory to do all that is required of them, and various forms of external mass storage such as tapes and discs are used as peripherals to provide extra memory. In a conventional system the management of the tasks of storing and retrieving data from peripheral mass storage is performed by a portion of the operating system called the file system. Very briefly put, the file system keeps track of where things (identified by "file names") are located in terms of physical (medium related) addresses, and converts relatively high level file operation requests into corresponding sequences of hardware level commands that appear as I/O traffic directed to one or more mass storage peripherals.
2. File Systems
Even for a computer system that dispensed with peripheral mass storage because it actually had more addressable R/W memory than the largest amount of information to be stored in or manipulated by the computer, a file system would still be a useful thing. Such a file system would manage the task of sorting and retrieving files of information from R/W memory, instead of from a peripheral. It is important to realize that a file system can provide a paradigm for the organization of related data into collections regardless of where they are stored, and is actually more than simply a friendly interface to an otherwise hard-to-use dumb peripheral.
Nevertheless, most file systems have started out as tools for the management of information stored on mass storage peripherals, such as tape and disc drives, and were developed to run under particular operating systems. The prevalent types of mass storage having high capacity and high speed were mainly variations of rotating magnetizable media: drums, fixed head hard discs, moving head hard discs (with removable packs), floppy discs, and eventually the so-called "winchester" discs. Despite their differences, these different techniques have a lot in common concerning how the file system reads and writes to them, and how it goes about managing the storage space presented by medium. That is: Addressing is surface-track-sector, and the minimum quanta of information to be read or written is a sector containing a certain number of bytes; Read and write operations are always freely permitted at all addresses, with any erase phase bundled into the write operation by virtue of how the hardware is constructed. Accordingly, except for minor adjustments to accommodate issues such as sector size and number of tracks per surface, a file system could treat the basic space management issues of all disc drives in pretty much the same way. The operating system and its file system were thought of as (magnetic) disc based, and if later there appeared some other mass storage technology requiring a significantly different space management paradigm, well, then a separate subsystem was going to be needed to make it work.
Many current operating systems and their file systems had their origins at a time when data compression technology was either nonexistent or was in its infancy. As an issue, data compression was, and in many respects continues to be, ignored by developers and maintainers of these now popular operating systems and their associated file systems. The system calls and file structure for a popular operating system and its file system become de facto, if not actual, industry standards. Much ancillary hardware and software gets developed and then sold to customers on the premise that these standards can be trusted to impart a relatively stable economic value to those hardware and software products. It is expected, then, that evolutionary improvements in an operating system and its file system will be backwards compatible, so that old products will continue to run on new systems.
Today, data compression is a robust and well developed technology. Conventionally, however, it has been treated as an adjunct to the file system, since including it would have a severe impact on how the file system goes about its business; to date, data compression has generally not been transparent, and it makes files non-standard. Having become widely accepted, a desire for the continued use of the protocols and formats of popular operating systems and their file systems seems to suggest that data compression arrived too late to be incorporated therein.
These popular file systems were well suited for the environment for which they were originally developed, but they are not necessarily the best choices for today's much wider selection of peripheral mass storage technologies. In at least four areas a significant improvement in "performance" can be realized by: (a) incorporating data compression into the file system; (b) incorporating a file system "into" a mass storage peripheral (actually, it could be located at any of various places in series with the interface path between the computer and the peripheral); or, (c) both. Depending upon the circumstances, "performance" can be variously understood as either: speed; capacity; reduction in control complexity (simpler coupling of system elements); or, an increased level of functionality.
The four areas susceptible to increased performance are: (1) data compression of the actual information stored on the medium; (2) autochangers for either magnetic or optical media; (3) media where erasure must be performed as a separate operation prior to writing; and (4) WORM (write-once/read-many) drives. Naturally, the introduction of special or different file operation commands is to be avoided in favor of retaining the commands of the existing file system. Nor is it desirable to replace the existing file system; at most it should need minor augmentation to produce the desired "transparency". Last, no capability should be lost; the new system should be capable of all operations that the old system was, "without exception". (Within the limits of what makes sense, of course; one can't do a write operation to a CD-ROM file system, for example.)