The invention relates generally to sequential access storage media, and more particularly to accessing variable-length data segments on a sequential access storage medium.
Sequential access storage media, such as magnetic tapes and WORM (write-once, read many) optical disks, are typically used for the storage of large amounts of data. Sequential access storage media offer low cost storage options relative to other storage alternatives, such as magnetic disks, disk arrays, or random access memory (RAM). A disadvantage of sequential access storage media, however, is the relatively slow process of positioning to a specified location on the media. For a tape, such positioning typically involves the mechanical winding and/or rewinding of the media to locate the proper location of requested data on the tape. As such, positioning to a specified data offset on the tape presents a costly operation in the overall process of retrieving recorded data from a sequential access storage medium.
When writing a large data stream to a sequential access storage medium, it is desirable to divide the data stream into smaller data segments for several reasons. First, a segmented allocation of data improves error recovery. A data segment has a header and error recovery parameters, such as a checksum parameter. If a very large data stream comprises one data segment, the entire data stream must be read before the data segment can be validated. It is also possible for the entire one-data-segment data stream to become irrecoverable because of a minor error. Therefore, if the data stream is allocated into multiple smaller data segments, an error in a single data segment will not cause the entire data stream to become irrecoverable.
Second, it is desirable to have smaller segments because data segments can act as boundaries to facilitate the automatic flushing of data from system buffers. Generally, an application xe2x80x9cwritesxe2x80x9d a block of session data to a system buffer before the data is actually recorded to the sequential access storage medium. Some systems use data segment boundaries to define the data that is automatically flushed by the system out of the buffer and to the medium.
Third, it is desirable to have smaller data segments because the entire length of the data stream may be unknown to the application when the xe2x80x9cwritexe2x80x9d operation is initiated, thereby preventing storage of the data segment length in the data segment header. Alternatively, if the data stream is divided into multiple data segments, the application can specify a data segment length, even if the length of the entire data stream is not yet known.
One technique for decreasing the operative cost of positioning on a sequential access storage medium involves fixed-length data segments. That is, data to be recorded on the sequential access storage medium is allocated in fixed-length data segments recorded on the medium. Each data segment typically has a fixed-length header concatenated to the beginning of each data segment to specify, for example, a data segment index, the starting location of the data within the data segment, the data length within the segment, and the amount of padding. If the data recorded in a segment is shorter than the entire fixed-length data segment, the remainder of the segment is xe2x80x9cpaddedxe2x80x9d (i.e., subsequent session data is recorded in the next data segment and not in the remaining space in the current data segment).
It is to be understood that a media offset represents the sequential offset along the media, including headers, data, checksums, and padding, while a data offset represents a sequential offset of the data only, excluding headers, checksums, and padding. To locate a specified data offset Od, a program (e.g., an operating system, an application, a system driver, or an embedded program) calculates the media offset to the data segment that includes the specified data offset Od using simple deterministic mathematics. A disadvantage of fixed-length data segments is that the data segments tend to be fixed at a large value (e.g., 64 KB (kilobytes)) and, therefore, can require significant padding, which diminishes the storage efficiency by introducing what is essentially wasted space on the medium.
To minimize the wasted space introduced by fixed-length data segments, an existing approach uses variable-length data segments to minimize the excessive padding of the fixed-length data segment approach. Variable-length segments imply that the data segments need not comply with a predetermined fixed-length, even though some or most of the data segments may have the same length. However, the simple deterministic position approach used in the fixed-length data segment approach does not work with variable-length data segments. Instead, the variable-length data segment approach involves sequential traversal of each header along the media to position a reader to a specified data offset. The data offset and data length information in each header along the media are sequentially evaluated until the data segment containing the specified data offset Od is reached. This traversal mechanism is time consuming, particularly for specified data offsets located at the end of the medium.
In accordance with the present invention, the above and other problems are solved by incorporating variable-length data segments onto a sequential access storage medium without requiring sequential traversal of the headers on the sequential access storage medium.
Methods and program products for accessing session data having variable-length data segments on a sequential access storage medium are provided. Each variable-length data segment includes a header having a predetermined signature field.
When storing data to the sequential access storage medium, the data segments are aligned to predetermined alignment intervals. The data segments are recorded on the sequential access storage medium such that no session data that matches the predetermined signature field is aligned with the predetermined alignment interval.
When retrieving session data from a specified data offset, one or more estimated media offsets are iteratively estimated to locate the specified data offset on the sequential access storage medium, each estimate moving forward or backward from the previous estimate on the medium. Each data segment located at an estimated medium offset is evaluated to determine whether it contains the specified data offset. When the specified data offset is found, a reader is positioned at the corresponding data segment, and the requested data recorded in the corresponding data segment is retrieved.
A system for accessing session data on a sequential access storage medium is also provided. When storing session data to a sequential access medium, a buffer receives and stores the session data. An allocation module allocates the session data to variable-length data segments in the buffer. Each variable-length data segment includes a header having a predetermined signature field. An alignment module aligns each header with a predetermined alignment interval on the sequential access storage medium. A recording module recording each variable-length data segment to the sequential access storage medium such that no session data that matches the predetermined signature field is aligned with the predetermined alignment interval.
When retrieving recorded data from a specified data offset on the sequential access storage medium including variable-length data segments, an estimation module iteratively estimates one or more estimated media offsets associated with data segments. A reader is configured to read data from the sequential access storage medium in accordance with a provided media offset. An evaluation module evaluating each estimated media offset on the sequential access storage medium to determine whether the data segment located at each estimated media offset includes the specified data offset. A positioning module positions the reader on the sequential access storage medium to the data segment that includes the specified data offset. An input module receives from the reader the recorded data located in the data segment at the specified data offset.