Today, we are experiencing an exponential growth in the amount of data being stored in a digital format. In part, this exponential growth can be attributed to the proliferation of the Internet, which has made the duplication and distribution of large data files remarkably easy. For example, the Internet has become a common medium for distributing multimedia data files, such as movie trailers. These multimedia data files are large data structures, typically on the scale of 1 to 10 megabytes. Although conventional data storage mechanisms, such as hard disk drives and linear tape storage systems, can be employed for storing such large data files, neither of these existing technologies is particularly well suited for this application.
For example, hard disk drive systems are well suited for storing small data files that need to be rapidly accessed. However, hard disk systems become less efficient as file size increases. Specifically, as the size of the data file increases, a problem known as fragmentation arises. Fragmentation occurs because hard disk systems break data files into smaller subsections, typically 512 bytes each, and store the subsections where space is available on the disk. Initially, the subsections are often stored closely together on the disk surface, and this allows the read/write head of the drive to quickly access the different subsections and retrieve the entire file from the disk. However, over time, as the file is read off the disk, edited and rewritten to the disk, the subsections become scattered over the surface of the disk. This effect is known as fragmentation. Over time it is quite common that the individual sub-sections of a large data file become dispersed widely across the surface of the disk. This wide separation of sub-sections increases the time it takes to assemble a data file as the read/write head must move across large sections of the disk, multiple times, to be able to access each sub-section of the file being accessed. Accordingly, for large data files stored on a hard drive, data fragmentation can cause the file access time to increase substantially, such as from milliseconds to seconds.
Similarly, tape data storage systems lack an architecture that is well adapted for retrieving files from large size data structures. In particular, linear tape storage systems write data onto a storage tape in a long continuous stream, creating a long track that may extend lengthwise across the entire tape and may be composed of multiple data files. This track continues for file after file, with each new file being appended to the existing track. Thus, each new file begins where the last file ended, and the track is continually extended. Retrieval of data from such linearly recorded tapes may take minutes to fractions of an hour.
In another type of conventional tape storage system, data can be stored in a “serpentine” fashion as device blocks in a pattern that bi-directionally traverses the tape's length along different parallel tape tracks. Using the “serpentine” data storage pattern, the data can be congregated on all tracks as close to the beginning of the tape as possible. Another system “partitions” the tape by pre-formatting a magnetic tape medium into evenly spaced multi-track segments. Larger-sized partitions for longer files can be formed by joining adjacent segments. The partitions may be automatically padded with a selected number of empty segments to allow for file expansion, for example, after editing.
Although these architectures are well suited for quickly storing large files, they are not well suited for allowing these files to be accessed, edited and rewritten to the tape. Specifically, if a file is accessed and edited, its length will change and it may no longer fit in its prior space between the files that previously bounded it. Thus the system can only append files to the track, causing new files to be written to new locations on the tape each time a file is saved. This of course can quickly use up storage space, particularly if large files are being edited.
Moreover, file access time for a tape system is typically measured in seconds or minutes as the linear tape storage system must roll through large lengths of tape before being positioned at the location of a particular file. Accordingly, tape backup systems, although excellent for long term storage of large data files, are poorly suited for providing convenient read/write access of data.
Thus, there is a need in the art for a data storage system that is suitable for providing reliable read/write access to large size data structures, and to provide such access within a reasonable time.