Contained herein is material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction of the patent disclosure by any person as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all rights to the copyright whatsoever.
1. Field of the Invention
The invention relates generally to hashing and searching techniques. More particularly, the invention relates to a method and apparatus for locating data utilizing a stored hash values derived from the data.
2. Description of the Related Art
FIG. 1 illustrates a general directory hierarchy that may exist in a prior art computer file system, such as a file system utilized under any of the Windows(copyright) platforms, and the progression through the directory hierarchy traversed in performing a search for a file or directory. A user may enter a file path character string such as indicated by file path 105 when using the find function provided with the operating system, or the user may click on a shortcut resident on the computer desktop which will invoke the operating system""s search function to find the file according to a preset file path 105. FIG. 2 is a flow diagram illustrating prior art search processing that may be followed.
After a file path search string 105 is received in block 205, the search processing begins in the Root Directory page 110. In block 210, the first entry in the Root Directory page, xe2x80x9cProgram_Filesxe2x80x9d 111 is read. The character string, xe2x80x9cMy_Documents,xe2x80x9d from the file path 105 is compared to the first Root Directory page entry in block 220. Since the two character strings do not match, the processing proceeds, as shown in decision block 230, back to block 210 where the next character string 112 in the Root Directory page 110 is read and the comparison processing of block 220 is repeated for the second entry in the Root Directory page 110. Since xe2x80x9cMy_Documentsxe2x80x9d 112 in the Root Directory page 110 matches the Root Directory file path character string, the processing for this directory page is complete as indicated by block 240. The search processing proceeds to the My Documents Directory page 120 where FIG. 2 processing is repeated by comparing data elements 121, 122 and 123 with the search character string xe2x80x9cWord_Filesxe2x80x9d until a match is found between the file path directory name, xe2x80x9cWord_Filesxe2x80x9d and the third entry 123. Next, the same processing is repeated for both the Word Files Directory page 130 and the Reports Directory page 140 until the xe2x80x9cIntel_Persistant_Storage.docxe2x80x9d file name 141 is located. Presumably, the filename contains a pointer to the actual file permitting it to be retrieved.
One draw back of searching using character strings is the time and processing capacity necessary to perform the multiple comparisons. For instance, using a 32 bit processor only 4 characters of a character string may be transferred over a processor bus every processor cycle. Accordingly, even if it assumed that the comparison between the character strings require no extra cycles for comparison processing, a total of 7 cycles would be required just to access the character string, xe2x80x9cIntel_Persistent_Storage.docxe2x80x9d 141 from the Reports directory page 140. This problem is compounded as the numbers of comparisons performed to find a match are increased. For example, the file path search discussed above and illustrated in, FIG. 1 would require 41 cycles using a 32 bit processor just to access all necessary character strings. The problem is compounded in that character strings may be very long. For instance, file names of up to 512 characters are supported in the Windows(copyright) environment.
Hashing is a common and well known method of storing data that facilitates quick access of the data. Hashing generally involves the transformation of data, such as a character string that has meaning to a user, into a value or number that identifies the location of the corresponding data in a structure called a hash table. For example, a hash value of 116 might signify the 116th row in the related hash table. In order to find the data associated with hash value 116, it is not necessary to search through a list of hash values until the hash value 116 is located, rather the 116th row in the table need only be accessed to find the desired data element or a pointer thereto. In certain instances, two data elements may have the same hash value requiring a mechanism whereby a compare is performed to distinguish between the data elements to determine which data element is desired.
The method of using the hash value to look up data in a hash table requires the creation of a hash table with as many rows as there are possible hash values and the reservation of physical memory space related to the operation and management thereof. Therefore, this method of accessing data may not be practical in situations where the number of distinct data elements and the hash values associated therewith are extremely large or practically unbounded requiring the reservation of a significant amount of memory space.