1. The Field of the Invention
The present invention relates to systems and methods for comparing file data streams in a file system. More specifically the present invention allows a file system to rapidly compare file data streams, without making actual comparisons between the data streams, by comparing between native data signatures generated for each data stream.
2. The Prior State of the Art
In a computer operating system, a file system is generally known as the overall structure in which files are named, stored and organized. In the early 1980's, the FAT (file allocation table) file system was developed for use with MS-DOS (MICROSOFT disk-operating-system) on the first generation of personal computers. The computers generally had two drives for low-density floppy disks and the FAT system performed more than adequately in managing the small disk volumes and the hierarchical directory of the structures and files. Over time, the FAT system even continued to keep pace with the needs of personal computer users as computer hardware and software increased in power and speed. However, as the low-density floppy disks began yielding to larger hard disk drives, file data searches and retrievals were correspondingly slowed down in time.
By the end of the decade, further deficiencies were observed with the FAT system. For example, as the hard disks of personal computers evolved into 40 MB or more, users were required to partition their disks into two or more volumes because the FAT system was limited to 32 MB per volume. Later versions of MS-DOS, however, did allow for larger disk volumes.
Eventually, as hard disks grew even larger, a high-performance file system (HPFS) was introduced as part of a new operating system, OS/2, to more efficiently manage the large volumes on hard disk. The HPFS found speed advantage over the prior FAT system by: (1) reserving sector space for booting, maintaining and recovering the file system; (2) reserving bitmap space at selected intervals throughout the disk to help prevent fragmented file storage; and (3) by its use of "B-Tree" root and nodal hierarchical structure that allowed for fast traversal of stored file data. The HPFS even allowed for increased in the size of file names from an eight-plus-three character format, used by the FAT system, to 255 characters. The longer file names enabled longer, descriptive names to be employed.
While both the HPFS and the FAT system were introduced a relatively long time ago in a rapidly changing computer industry, both steadfastly remain popular file systems, and because of their speed and versatility, continue to be extensively used. But file management limitations still, nonetheless, exist with both the HPFS and the FAT systems.
For example, in today's computer industry, numerous sophisticated applications demand near zero-fault data transactions to occur at substantially instantaneous speeds. Such applications include transactions in the national and world financial markets, airline industry, banking industry and in various engineering and scientific applications, to name but a few. While numerous variables are involved in the pursuit of instantaneous and zero-fault tolerance transactions, today's computer world welcomes any advantages in speed and accuracy, even at the most rudimentary levels. In fact, there are times when such transactions simply reduce to determinations about particular characteristics of data streams stored in files. For example, determinations ascertaining whether stored data has changed or remains the same is often times enough to make a difference in how rapidly or effectively a transaction occurs. With the FAT and HPFS systems, such determinations are performed by comparing between two or more data streams on a one-to-one, i.e., bit-to-bit/byte-to-byte, basis.
While both the FAT and HPFS systems are able to efficaciously make data stream determinations, neither are particularly well suited for these types of applications because valuable time is consumed in comparing data streams one-to-one. This is especially true with applications involving extremely large data streams. Consequently, it is desirable to make faster, time-saving comparisons between data streams while maintaining, or improving, accuracy.
A further limitation is that often times valuable storage space is wasted when old files are updated with new data and then the old files are "backed-up" again to reflect the new data. While the new data changes many have only been minimal, or trivial, in amount, the backed-up file has repeated the storage of much of the same data as before the entry of the new data. Thus, it would be desirable to have a system and method whereby only actual changes in file data are backed-up instead of repeatedly storing redundant file data.
Yet another problem is that the prior art file and operating systems are relatively deterministic and cannot now be easily extended to cover scenarios not envisioned by its designers. As such, attempting to incorporate another method or file system into the operating system to facilitate the described sophisticated file management needs, would be difficult, if not impossible. It would therefore be desirable to provide systems and methods that can be fully assimilated into developing file and operating systems while simultaneously contemplating and providing for future file management utilities not yet fully developed or envisioned.