1. Field of the Invention
Present invention relates to hierarchical storage systems, more particularly it relates to addressing techniques used within such systems.
2. Discussion of the Prior Art
Addressing large capacity memories have used so-called "hashing" techniques for years, particularly in the main memory area. Generally, a hashing operation includes generating an index indicator for a so-called hash class. The index indicator directs the addressing mechanism to a so-called scatter index table (SIT) which contains the address of a memory-address directory entry supposedly relating to the area of memory to be accessed. The directory entry is linked to other directory entries of the same hash class by a singly linked list. Accordingly, to access a given item within a memory, the index indicator is generated, the address to the directory entry is used to access a directory entry. If no match between the desired memory address and a memory address stored in the directory entry is found, then a succession of directory entries are examined within the hash class to see if the directory has an address indicating that the memory contains the data or has space allocated for receiving data. In the event such an area is identified in the directory, a so-called "hit" is made; access to the memory can proceed. If the area is not identified through the hashing technique, then a "miss" occurs. Following a miss in hierarchical systems, data is either transferred from a backing store to the memory or space is allocated within the memory to receive data for recording.
When the size of a hash class is large, many items are mapped into that class. This plural mapping is often referred to as "collisions" in that multiple data items collide into the same hash class. Searching hash classes due to a large number of collisions can greatly increase the access time to a memory, particularly when the directory is not content addressable. Accordingly, in many memory applications it is desired to keep the size of the hash class to a minimum for reducing the searching time of the directory. In contrast, when a content addressable memory is used for the directory, all searches are all conducted in one cycle. Unfortunately, content addressable memories are expensive, therefore in many applications such a content addressable memory is not feasible.
The problem becomes particularly acute in relatively large memories. For example, when a large cache is to act as a buffer for disk storage apparatus (DASD) and the cache has the capacity of 8 megabytes or greater, there is a conflict between reducing the number of collisions and controlling costs of the storage system. A further problem occurs in that disk storage apparatus exhibits several delay access boundaries. A first delay boundary called latency, is based upon the rotational characteristics of the disk storage apparatus. One or two transducers are positioned with respect to a rotating disk surface such that access to a given point on the surface depends upon the latency of rotation. Further, in most disk storage apparatus, a single transducer is provided for a single recording surface. This means the transducer is moved radially from track to track (in a multi-surface disk storage apparatus the move is from cylinder to cylinder--a cylinder being all tracks on the same radius) called a cylinder seek. Both of these delays in addressing and accessing are due to the mechanical characteristics of the disk storage apparatus. Accordingly, the number of misses in a cache that do not accommodate such mechanical delays can greatly increase access times to data areas. Accordingly, it is desired to provide access to a cache which minimizes the effect of such mechanical delays in the backing store on total system operation.
Many prior hashing techniques employ random distribution of the addresses such that the number of collisions tend to be reduced. A corollary is that the addresses should be evenly distributed across the address space of the memory being accessed. Such principles are set forth in several articles published in the IBM Technical Disclosure Bulletin. For example, in May, 1977, pages 4822-4823, J. L. Carter, et al in "Class of Fast Hash Functions Using Exclusive OR" and on page 4826 in the article "Method of Extending Hash Functions for Long Keys" teach that a pair-wise random hashing function produces an average running time which is linear to the number of transactions. While this is true for random access memories, such as employed for main memories, it is not necessarily true where access delay boundaries exist. Accordingly, the so-called "Constant of Proportionality" discussed in these articles does not validly apply to all situations particularly where access delay boundaries exist.
Prime numbers have also been used in hashing techniques. For example, see the article by R. P. Brent "Modified Linear Scatter Storage Technique" found on page 3489 of the April, 1972 edition of the IBM Technical Disclosure Bulletin. Again, this article relates to a hashing technique suitable for random access memories not having significant access delay boundaries.
Another aspect of hashing is to reduce the hash time, i.e. reduce the time it takes to generate an address. Such reduction has been achieved by judiciously selecting names for data which are convertible to an address. For example, in the IBM Technical Disclosure Bulletin, June, 1975 issue on pages 38-39, L. J. Waguespack in "Predistributed Logical Name Generation" shows a hashing technique wherein a single level Exclusive-OR hash is driven by predistributed logical names for accessing random access memories. A similar technique is shown in the article by D. C. Bossen, et al "Generating Unique Names for Virtual Segments" published in the IBM Technical Disclosure Bulletin August, 1975, pages 880-881. This article is similar to Waguespack's article in that address predistributions and Exclusive-OR functions result in a hash table addressing.
In an installed data processing system, memories can be changed in size. Accordingly, the hashing technique should be easily altered. This situation was addressed in one of the articles mentioned above, but also set forth in U.S. Pat. No. 4,215,402 where the SIT and hash size are matched to main memory size. Again, the hashing was for a pure random access memory not exhibiting significant access delay boundaries.
A summary of desirable hashing techniques is set forth in the IBM Technical Disclosure Bulletin article by R. F. Arnold, et al "Uniform Hashing Algorithm", pages 2214-2216 of the December, 1973 issue. This article relates to mapping virtual address space into a real address space. Desirable properties of the hashing algorithm used to map the address spaces is uniformity of distribution, random distribution of the sequential virtual addresses, and below the granularity of hashing provide sequential virtual addresses that map to real addresses. All addresses should match one for one from virtual to real, minimum remapping is to be required in the hash for memory changes, computation should be rapid (short delay) and repeatable. A portion of the hashing algorithm described in this article requires iterative processes where a hit does not occur immediately. It does employ arithmetic techniques including carries and borrows as opposed to modulo two addition (such as Exclusive-OR functions). While this article shows hashing procedures desirable for a random access memory not having significant access delay boundaries, it is not seen how the teaching can be applied to backing stores exhibiting various access delay boundaries.
In addition to all of the above, a hierarchical storage system can have a plurality of disk storage apparatus. A single cache should provide the caching function for all of the apparatus. Therefore, in addition to the internal access delay boundaries of such apparatus, hashing should accommodate unique characteristics of a plurality of such disk storage apparatus. For example, so-called cylinder "0" of each disk storage apparatus is usually used as an index to the contents of the data stored in the apparatus. Cylinder "0" is usually the radially outward-most cylinder of tracks. Accordingly, it is to be expected that cylinder "0" may be accessed more frequently than other cylinders of the disk storage apparatus; therefore there should be no collisions between cylinder "0" of one disk storage apparatus and cylinder "0" of another storage apparatus. Any random distribution, even though it be uniform, implies a possible collision of any relative address with another relative address. Accordingly, random distribution of hashing should be avoided when disk storage apparatus of usual design are employed.