1. Technical Field
The present disclosure generally relates to comparing digital file systems more specifically, the present disclosure relates to comparing digital file systems at a bit-level.
2. Description of the Related Art
Data is typically stored in binary form in groups of bits, for example a group of eight (8) bits is called a “byte” or in 16 bits called a “word.” Other sized groups of bits are also used in some systems. A bit is currently the smallest data unit available and is typically embodied as an electrostatic nontransitory storage medium comprising transistor (or similar electrical switching device) and a capacitive element, as a magnetic nontransitory storage medium comprising a magnetically readable/writeable media, or as an optical nontransitory storage medium comprising an optically readable/writeable media. Newer format nontransitory storage includes memristors, atomic or molecular storage devices, and quantum storage devices. Regardless of storage media format, every piece of binary data is traceable back to a series of storage elements, each of which retains a nontransitory state indicative of either a binary “zero” or a binary “one.” An error as small as one bit can have a profound impact on the content of a particular file—for example, the letter “A” is represented in extended ASCII as a binary value of “01000001.” A change in just one bit, for example to “01000011,” changes the letter from “A” to “C.” Thus, even relatively minor bit errors can have a significant impact on the data present in a file.
Modern data transfer rates and reliability continue to increase with improvements in network infrastructure. Data transfer rates of 6 to 50 megabits per second (“Mbps”) are fairly commonplace. At a data transfer rate of 12 Mbps (i.e., 1.5 megabytes per second where 8 bits=1 byte), a fairly small image or file having a size of 4.5 megabytes requires about three seconds to transfer. During those three seconds data representative of 36 million bits of information will pass between the systems. Multiply this one file by ten, a hundred, or even a thousand-fold and one can readily appreciate the incredible quantity of data exchanged between systems.
The distribution of digital media continues to evolve in the face of changing technology. The earliest systems were often hardwired and required significant time and labor to manually rewire to modify or change programming routines. Over time, hardwiring gave way to vacuum tubes that in turn gave way to transistors leading to the concept of “software”—programs and algorithms that could be electronically stored and retrieved. Rather than hardwired programs, “software” included a nontransitory storage medium that included information embodied in a machine-readable format as stored binary code. The nontransitory storage medium evolved from reel-to-reel magnetic tape, to rotating magnetic media (i.e., “floppy disks”), to rotating optical media (i.e., compact disc and DVD) each of which stored binary data in a machine readable format. Given the wide availability of network connections, software distribution has entered a new era in which stored binary data is communicated from a nontransitory storage location on a remote server to a nontransitory storage location on a local client device. Such is exemplified by the Apple® AppStore and the Google® Play store that are available on many portable computing devices such as smartphones.
The volume of digital data generated on a daily basis is growing rapidly and some estimate that by 2020 up to 35 zetabytes (35×1021 bytes) of data may be generated annually. Much of this data is collected, sorted, parsed, analyzed, and stored as files on nontransitory storage media. In order to keep files to manageable sizes, data may be allocated or otherwise divided into file systems that contain tens, hundreds, or even thousands of files, each of which may contain megabytes (106 bytes) or even gigabytes (109 bytes) of data. Communicating, transmitting, or exchanging such large volumes of digital binary data frequently involves the duplication of file systems containing a large number of individual files either on a single device or on two different devices such as a client and server. In such instances, ensuring the integrity of the communicated binary data is essential to ensuring the accuracy of the information conveyed by the data. Comparing two instances of a single file on a bit-by-bit basis may be time consuming depending on the volume of binary data in the file. Comparing two instances of hundreds or thousands of files, some or all of which may contain considerable quantities of binary data volume, on a bit-by-bit basis can tax the capabilities of even the largest of computing systems.