Computer systems typically include a central processing unit, various types of memory and a variety of peripheral devices including input/output devices and nonvolatile data storage devices such as removable media, including floppy disk drives, and fixed storage including disk drives and tape drives. Communication between the various peripheral devices and the central processing unit is typically controlled by a computer operating system. For example, well known computer operating systems currently used in personal computers include MS-DOS and Windows available from Microsoft Corporation. Under operating systems such as the MS-DOS operating system, a single file system controls the organization of files stored on peripheral devices.
In order for the computer system to read or write data in a format recognized by both the computer system and the respective peripheral devices, data typically must be organized in accordance with the file system management aspect of the operating system. One such file system is known as the File Allocation Table (FAT) file system. The FAT file system provided with the MS-DOS operating system is one of the most widely used file systems currently in use although other file systems may be associated with other types of data storage devices or operating systems. For example, an alternative file system is the High Performance File System (HPFS) which is described in U.S. Pat. No. 5,371,885.
File systems facilitate communication between the operating system kernel and device dependent drivers. The file systems are typically responsible for operations such as converting read and write commands generated by an operating system kernel into a form which may be supported by the particular device dependent drivers. The file system may further be responsible for the allocation of disk space for a file on a device such as a hard disk drive.
The allocation of disk space is typically done only on an as needed basis. That is, the disk space is not pre-allocated but is instead allocated one cluster (unit of allocation) at a time. Typically, a cluster in a FAT type file system is one or more consecutive disk sectors. The clusters for a file are linked (chained) together and, typically, kept track of by entries in a file allocation table. In addition to the file allocation table, a directory is typically maintained containing a directory record for each file. Some designated size portion of the peripheral hard disk device is generally established for the file allocation table and the directory at the point of initialization of the disk.
To locate all of the data that is associated with a particular file stored on the hard disk drive, the starting cluster number of the file is obtained from the file's directory entry, then the file allocation table is referenced to locate the next sequential cluster associated with the file. In other words, the file allocation table is typically provided as a linked list of cluster pointers so that each file allocation entry for a file points to the next sequential cluster used for that file. Under the MS-DOS FAT file system, each entry in the file allocation table is typically a 16 bit value. Furthermore, the last entry for a file in the file allocation table is typically assigned a number designating that entry as the end of file cluster (i.e., indicating that no more clusters follow). The end of file designation number typically ranges from FFF8 to FFFF (base 16) inclusive.
The size of a cluster under the file system is typically defined at the time when the hard disk is formatted and usually ranges from about 1 to about 128 sectors. Using a 16 bit file allocation table format, each sector of a file allocation table can point to up to 256 clusters and a corresponding multiple number of sectors on the disk drive depending upon the size of a cluster at the time of initialization.
One particularly problematic file management application involves the collection of high speed or bursty data into a file to provide a permanent record of the data. Examples of applications which may generate such high speed data storage demands include debugging operations such as network traces or software execution traces and other application level programs such as video data collection for a digital security system. In these types of applications, there are typically two, sometimes conflicting, considerations. The first decision that must be addressed is how much of the data is to be saved. Depending upon the amount of data collected or the time over which the collection occurs, it may not always be feasible to save all of the data up to a point of interest. For example, a two day trace on a busy network could fill thousands of megabytes of memory. Even if there was sufficient hard disk space available for storing all such data, in all likelihood, the analysis of this volume of data would be impractical. The second issue is to determine the degree to which the data collection process interferes with the performance of the computer system. File operations, such as save operations, generally tend to be demanding of computer system resources while at the same time the application is generating high speed data streams from data collection operations. Such data collection applications tend to be intolerant of interruption due to computer system conflicts or resource limitations. Various current approaches to limiting the size of a data collection file potentially impose undesirable delays on operations when high speed data is being collected.
One solution which has previously been proposed to the problem of high speed data collection is to create a fixed size output file and treat it as a wrap buffer by managing the current offset pointer of the file system. Under this approach, for example, an application program may manage its own file offset pointer to track the filling and wrapping of the output file as data accumulates. The application program would typically have to perform an expensive file seek operation to reset the file offset pointer back to the beginning of the file each time the file wraps due to data overflow. Additionally, depending on synchronization issues in the particular implementation the application may also need to perform a file seek operation before every file write, effectively doubling the problematic performance overhead of the file operations. Finally, the application program would generally need to preserve an accurate copy of the current file offset in the file itself so that parsing routines will be able to find and parse the beginning of the data at a later time.
To ensure that there is always an accurate copy of the file offset within the file itself, the offset value should also be updated on each write, again doubling the file operation overhead to seek to the offset storage location, write the current offset for later use, then seek back to the next write location. It should also be noted that every application program reading or writing the file would be expected to duplicate this logic and to have a common understanding of the file format to know where to get/save, among other things, the working file offset. This approach may be inconvenient to implement as it requires additional processing for both the collection and post-processing phases and further requires additional formatting information to be stored in the file. In addition, the file seek time to manipulating the current offset pointer may be burdensome.
Another alternative approach is to re-create the file after it fills all of its allocated disk space. Such an approach involves truncating the front of the file when a fill threshold is reached. However, this approach also generally requires creating a new file, copying the later saved portion of the original file to the new file and then renaming the newly created file to the previous file name. This approach is expected to be very burdensome and not appropriate for high performance systems.