1. Field of the Invention
The present invention relates generally to data processing networks, and more particularly to data storage systems and file servers. The present invention relates specifically to streaming data from a multiplicity of files.
2. Description of Related Art
Data streaming from a multiplicity of files typically occurs during a backup operation or a migration operation. For backup, data from selected files of a file system are written to backup media such as magnetic tape. For migration, data from selected files of a file system are copied from one file server to another file server.
For backup to tape, it is very desirable to maintain a relatively continuous stream of data from storage to the tape drive so that the tape can be transported at a relatively constant and high rate of speed as the stream of data is written to the tape. This ensures that the data stream is written to tape at the maximum possible rate, and the tape is not stressed by unnecessary acceleration and deceleration.
One way of providing a relatively continuous stream of data from a number of files to tape is to use intermediate disk storage as a buffer between the files and the tape. For example, as described in Tzelnic et al., U.S. Pat. No. 5,829,046, issued Oct. 27, 1998 and incorporated herein by reference, network clients send backup data to a file server having a cached disk array and a tape silo. The cached disk array serves as a speed matching buffer and as a means for combining the data or files to be written to a particular tape cartridge in the tape silo.
More recently, it has become customary for network clients to keep their critical data on a network file server instead of on local disk storage. The network clients expect the critical data to be backed up automatically, and restored after any failure of the network file server. In this situation, the data to be backed up resides in files in disk storage of the network file server, and it is desirable to stream the data directly from the disk storage to tape. This has been done using multi-threaded backup software in the file server, including a remote control thread, a file opening thread, a file reading thread, and a tape writing thread, as further described below with reference to FIGS. 2 to 7.
In general, a thread is a single, sequential flow of control within a process. Within each thread there is a single point of execution. Each thread has it own set of register values. Therefore, each thread is a single instance of a program routine. A single processor can execute no more than one thread at any given time, but the processor may suspend execution of one thread to begin execution of another thread. The operating system can schedule execution of a thread, suspend execution of a thread, resume execution of a thread, and terminate execution of a thread. The operating system can schedule the execution of threads on a priority basis, and threads of the same priority can be executed on a time-share basis. In this fashion, a multithreaded program can perform computations while waiting for completion of a read or write to disk or while waiting for receipt of a packet from a data network.
There are various standards and software tools to help a programmer writing a multithreaded program. For example, IEEE Std 1003.1-1996 includes a Threads Extension for the Portable Operating Systems Interface (POSIX®) to open systems. This Threads Extension, called POSIX1c, has been included in the Single UNIX Specification, Version 2, as described in “Threads and the Single Unix® Specification,” Version 2, May 1997, by The Open Group, 8 New England Executive Park, Suite 325 Burlington Mass. 01803-5007, opengroup.org, and in the “Unix® Systems Threads Reference,” also by the Open Group. Another implementation of the IEEE® POSIX® standard is described in “Tru64 UNIX Guide to DECthreads,” July 1999, by the Compaq division of Hewlett-Packard, hp.com.