In computerized processing of data, it is common practice to store like data items as multiple entries within a named data file. A portion of each record, referred to as the key, is used to reference a specific record. The keys are assumed to be unique throughout the file. Fundamental to the processing of the data file is the search for a data record associated with a specific key. A number of techniques have been developed which perform this particular function. A class of these techniques is referred to as hashing access methods.
A hashing access method is commonly used when the number of actual keys is a small percentage of the total number of possible keys. This generally occurs when the key data is represented as ASCII character codes. An example is a 6-digit part number ranging from 000000 to 999999, which requires a 6-byte field (48 bits) with only ten valid values for each byte out of a possible 256 unique values. Another example is the use of a person's name as the key. In this case a fixed length field (say 20 bytes) is allocated for key data. Since all names do not contain 20 characters and certain combinations of letters do not realistically represent a name, a high percentage of possible bit configurations will never be used as valid keys.
A distinguishing property of hashing methods is that they do not uniquely map keys to record storage locations. Instead, they provide for more than one key to map to a specific table entry which contains the location of one or more records. The object of effective hashing methods is to arrive at a uniform distribution of the number of keys which map to a specific starting pointer thus minimizing the search time for any randomly selected key.
Research done on hashing algorithms has produced a variety of methods, each one tailored to a specific set of properties possessed by the keys, that is, alpha keys, alpha-numeric keys, numeric keys, closeness of adjacent keys, number of repeated characters in the keys, etc. For software implemented hashing techniques, it may be acceptable to support several methods and allow the user to choose the most efficient based on his analysis of the key set to be used. The present invention has as an object the elimination of the need to support a variety of hashing methods by randomizing the data within the key such that all original properties of closeness, adjacency, and orderliness are removed.
The following patents are representative of the state of the art as known by applicants: U.S. Pat. Nos. 3,651,483, 3,742,460, 4,042,913, 4,064,489, 4,068,300, 4,086,628, 4,099,242.
The following three publications represent some teachings related to hashing techniques. The Art of Computer Programming, Vol. 3, published by Addison-Wesley Publishing Company, pages 506-549 offered by D. E. Knuth. Assembler Language for FORTRAN, COBOL, and P.L./I Programmers, IBM 370/360, pages 69-70 by S. S. Kuo published by Addison-Wesley Publishing Company. Hash table methods by W. D. Maurer and T. G. Lewis, published by the Association for Computing Machinery.