1. Field of the Invention
The present invention relates to a method for efficiently utilizing data recording media in a data processing system. More particularly, the invention relates to a method for improving the ability of a recording media to be copied and for reducing recording media spanning.
2. Description of the Related Art
Modern computers require a host processor including one or more central processing units and a memory facility. The processor manipulates data stored in the memory according to instructions provided to it. The memory must therefore be capable of storing data required by the processor and transferring that data to the processor at a rate capable of making the overall operation of the computer feasible. The cost and performance of computer memory is thus critical to the commercial success of the computer system.
Because today's computers require large quantities of data storage capacity, computer memory is available in many forms. A fast but expensive form of memory is main memory, typically comprised of microchips. Other available forms of memory are known as peripheral storage devices and include magnetic direct access storage devices (DASD), magnetic tape storage devices, optical recording devices, and magnetic or optical mass storage libraries. Each of these other types of memory has a greater storage density and thus lower cost than main memory. However, these other memory devices do not provide the performance provided by main memory. For example, the time required to mount a tape or disk in a tape drive, DASD, or optical disk drive and the time required to properly position the tape or disk beneath the read/write mechanism of the drive cannot compare with the rapid, purely electronic data transfer rate of main memory. It is inefficient to store all of the data in a computer system on but a single type of memory device. Storing all of the data in main memory is too costly and storing all of the data on one of the peripheral storage devices reduces performance.
A typical computer system includes both main memory and one or more types of peripheral storage devices arranged in a data storage hierarchy. The data storage hierarchy arrangement is tailored to the performance and cost requirements of the user. In such a hierarchy, main memory is often referred to as primary data storage, the next level of the hierarchy is often referred to as secondary data storage, and so on. Generally, the highest level of the hierarchy has the lowest storage density capability, highest performance and highest cost. As one proceeds down through the levels of the hierarchy, storage density generally increases, performance generally decreases, and cost generally decreases. By transferring data between different levels of the hierarchy as required, the cost of memory is minimized and performance is maximized. Data is thus stored in main memory only so long as it is expected to be required by the processor. The hierarchy may take many forms, include any number of data storage or memory levels, and may be able to transfer data directly between any two distinct memory levels. The transfer of data may employ I/O channels, controllers, or cache memories, as are well known in the art.
A variety of techniques are known for improving the efficiency of use of one or more components of a data storage hierarchy. One set of such techniques is known as data "compaction" and similar names. The term compaction has been used in many ways to refer to methods of storing and transmitting data efficiently. One type of compaction improves data transformation by using the minimum number of required bits to represent the most commonly coded characters. Less commonly coded characters may be represented by more than the minimum number of bits required. Overall, this compaction technique allows for a given amount of information to be coded using a minimum number of bits.
Another type of compaction which is frequently used is the coding of data in such a manner as to remove non-changing bits. Sometimes referred to as run length limited (RLL) coding, this type of compaction replaces strings of the same bit with a simple binary representation of the number of bits to be repeated. An example of such a technique is disclosed in U.S. Pat. No. 4,675,750. The patent discloses a video compression system including the removal of superfluous bits, as stored on magnetic tape.
Another technique for data compaction is the elimination of invalid data. Because recorded data may include invalid data subsequently corrected using error correction codes, more data storage space may be required to store the data than that required if no errors existed therein. In the IBM Technical Disclosure Bulletin Vol. 24, No. 9, February, 1982, page 4483, a technique is disclosed for eliminating invalid data from data sets. The technique includes copying only the valid data of a data set when the size of that data set reaches a certain threshold, ignoring the invalid data. The amount of storage space required to store such data is thus reduced.
Yet another compaction technique saves storage space by using fragmented storage space. Fragmentation refers to the unused portions of a recording media which result from frequent accesses to the data sets thereon. During the course of use, various areas of a recording media may be erased or otherwise eliminated from use. However, each contiguous unused recording space on the recording media may be so small as to make it difficult to record an entire data set therein. Compaction techniques are known for copying data sets from one recording media to another to permit the accumulation of several unused recording areas into a single large contiguous recording space. In addition, U.S. Pat. No. 3,787,827 discloses a data recording system in which a recording media is cyclically checked to locate unused spaces therein. Such checking ensures that unused areas in the recording media are eventually used.
Yet another compaction technique is blocking. Blocking is the combination of two or more logical records into a single transferable or recordable entity. The single entity is typically referred to as a block. Blocking reduces the number of inter-record or inter-block gaps which exist between records to permit them to be distinguished from one another. Blocking sacrifices the ability to access logical records individually to achieve a greater recording density. An example of such a blocking technique is shown in U.S. Pat. No. 3,821,703.
The aforementioned data compaction techniques are all directed toward reducing the amount of data storage space required to record a particular amount of information. In addition, the transfer of data in compacted form may improve data transfer rates. Because the term compaction is loosely used to represent any of the aforementioned techniques, the term "compression" will hereinafter be used to refer to any technique that saves data storage space by, for example, eliminating gaps, empty fields, redundancies, or unnecessary data to shorten the length of records or blocks. The penalty for using data compression is the overhead required to convert the data from uncompressed to compressed form and vice versa. The logic required to compress and decompress data may be provided in the host processor. Unfortunately, the compression and decompression of data at the level of a host processor detracts from the ability of the host processor to perform its normal responsibilities. Thus, the logic required to compress and decompress data is sometimes provided in the control units of peripheral storage devices, thereby offloading the responsibility for data compression and decompression from the host processor to the peripheral storage device. Data processing systems having the responsibility for data compression and decompression residing outside of the host processor are shown in IBM Technical Disclosure Bulletin Vol. 22, No. 9, February 1980, pp. 4191-4193 and IBM Technical Disclosure Bulletin Vol. 26, No. 3A, August 1983, page 1281.
Two problems arise when data compression is offloaded to the control unit of a peripheral storage device. The first problem is associated with the ability of a recording media to be copied onto another recording media. For example, consider the IBM 3480 magnetic tape drive, in which the listed storage capacity of a tape cartridge is 200 megabytes. Due to the nature of the tape cartridge production process, the exact length of tape wound in a tape cartridge can only be specified to within a particular tolerance. Thus, the actual storage capacity of a tape cartridge may be slightly greater than 200 megabytes. It is necessary to limit the total recorded data on a tape cartridge to that of the minimum amount of data capacity on the cartridge if the ability to copy the data from one cartridge to another single cartridge is to be guaranteed. If data were recorded until the actual capacity of the cartridge was exceeded (i.e., no tape remained) it would be possible to record more than 200 megabytes on a cartridge, and in turn it would be impossible to copy the entire contents of that tape cartridge to another tape cartridge having a capacity of merely 200 megabytes. Similar problems can occur with other types of data recording media.
Two techniques can be used to ensure that the amount of data recorded on a recording media does not exceed the minimum amount of data storage capacity guaranteed thereon. The first technique is to physically check how much of the recording media has been used throughout recording. Such a technique may come at the expense of heavy overhead or of imprecision. For example, in a tape drive it is known to use tachometers and the like to control tape motion and to track the length of tape on a particular tape reel. Examples of techniques for physically checking how much of a recording media has been used are disclosed in U.S. Pat. Nos. 4,125,881 and 4,811,132. Unfortunately, techniques for physically determining how much of a data recording media has been recorded are not accurate enough to be relied upon for all applications.
The other method for ensuring that no more data than the minimum capacity for a particular recording media is recorded includes monitoring the data as it is recorded. In data processing systems in which data is transferred or stored in uncompressed form, such techniques are reliable. As the data is written to the recording media, it is monitored to keep track of the total amount of data that has been recorded on each media. Because the data is not compressed, the amount of data recorded correlates to the amount of data seen by both the host processor and the storage device control unit. However, in data processing systems which compress data, it is necessary to know the amount of data recorded in compressed form. If the data is compressed within the host processor, there is no problem. Storage management software which runs in the host processor will have access to the data in compressed form and thus have the ability to monitor the amount of data stored in such compressed form. In many of today's data processing systems however, the overhead associated with compressing the data at the level of the host processor has proved too costly. As previously mentioned, the performance of the host processor has been upgraded by offloading the responsibility for compressing the data from the host processor to the peripheral storage device control units. Such offloading not only improves the performance of the host processor, but also permits data compression and decompression to be transparent to the host processor. Different compression algorithms may be used by each peripheral storage device connected to a single host processor so long as that device returns data to the host processor in uncompressed form.
In data processing systems in which compression is done in storage device control units it is impossible for the storage management software operating in the host processor to be aware of the amount of data stored on a recording media in the storage device in compressed form. Although the storage management software still "sees" the data in uncompressed form in the host processor, it is impossible for it to determine the exact amount of recording media space required to store the data when it is compressed. Merely recording until a particular amount of uncompressed data has been recorded could result in the minimum tape capacity being exceeded because the assumed amount of compression was not in fact accurate. Using counters in the storage device control unit, it is possible to monitor the amount of data that is recorded in compressed form. However, constant retrieval of such compressed data information from counters in the storage device control unit to the host processor for access by storage management software again results in costly overhead. There is thus a need for a method of accurately monitoring the amount of compressed data that is stored on a recording media with a minimum of host processor overhead.
The other problem associated with data compression is recording media spanning. It is generally desirable to avoid spanning a data set across multiple recording media because recall of that data set will require the mounting of more than one recording media, or if all required recording media are already mounted, more than one seek of data on those recording media. It is known to simply write data to the end of a recording media and span a data set across multiple recording media if so required when the end of a recording media is reached. However, as libraries of data recording media have grown in modern times, the need to avoid recording media spanning has become more important. Again, as it has become practice to compress data at the level of a storage device control unit it has become more difficult to predict the likelihood that a data set will be required to span across multiple recording media prior to its recording and with a minimum amount of host processor overhead.