Some current word processing systems use a very simple format for storing and locating documents on its diskette. There are always exactly 32 data sets ("jobs") available on a diskette. Allocation is by diskette track. The entire diskette index used for locating the data sets on the diskette is small enough to be kept in a system memory at one time. The volume index is recorded redundantly on the diskette. The diskette index is copied to system memory when the diskette is inserted into the drive. Retrieving and searching the index thus does not pose a performance problem.
One of the main problems with this approach is the lack of flexibility with respect to the number of jobs or documents available on a single storage volume (diskette). In general, a word processing system needs to store a variable number of data sets on a diskette volume, and more than 32 data sets should be available for the system operator. This is especially true on word processing systems in which one of the storage volumes is a high-capacity internal hard disk.
Another problem with this approach is that space on the diskette is allocated to a particular data set on a track basis. Thus, the average wasted space on the diskette (allocated but not actually used to store data) is one-half track for each job that is actually in use, or a maximum of 16 tracks out of the 70 tracks available on a diskette.
Other current word processing systems employ a 2-level diskette index, consisting of the diskette index which shows the location of data sets on the diskette, and one data set index for each existing data set. The diskette index is fixed-sized; the size of the data set index depends on the size of the data set. To locate a page of a document requires searching the data set index from the beginning to the appropriate points in the data set index which show where the page is located on the diskette.
The problem with this approach is that it is very vulnerable to media errors in the diskette index or data set index areas on the diskette. If a media sector containing the diskette index cannot be read successfully, all data sets accessed via that diskette index block, and all data sets accessed by subsequent diskette index blocks, are lost and no longer accessible to the operator. With respect to the data set index, a similar problem exists: if a data set index sector cannot be read from the diskette, that data set index block and all subsequent ones are lost, which means that all records accessed from those lost sectors are likewise lost to the operator.
Another problem with both of these approaches is that error-free operation on certain physical sectors on the diskettes is essential for the use of the diskettes. This applies to cylinder 0 (the diskette track or tracks accessed with the read/record head in the home position), where certain information structured according to standard architectures is required to interpret the contents of the rest of the volume. In other words, if an error is detected on track 0 of a diskette, the diskette normally may not be used further.
Another approach in current use has the data set index distributed with the data in the working (non-permanent) storage. Each data block has some control area containing, among other information, the location of the prior and the next block. This means that sequential access is very fast, since the current record always defines the location of the next (and prior) record. The problem with this approach is that random access (going directly to records in the middle of the data set) is slow, since all prior records must be read. To fix the random-access performance would require another data set index set up for random access.