Data conversion tables frequently take the form of large tables of data having ordered index or "key" entries, each associated with a respective data entry or entries. Each key entry is for storing a key value, and each data entry is for storing a data value. Such a table is illustrated in FIG. 1, wherein the table is comprised of 1 to n key entries, each having an associated data entry. Thus, key entry 5 has a key value of A458 and an associated data entry has the data value of 1005 (key and data values being given in hexadecimal format unless otherwise noted).
A search value is used to return a desired data value from the table. The search value is compared to the key values in a specified order, and the desired data value is retrieved when a key value is located which correlates to the search value.
Tables such as illustrated in FIG. 1 are typically provided from a single physical memory element to simplify data retrieval and table maintenance. In this case, the key entries and the data entries are said to be "unified." Operations affecting the table such as insertions and deletions are referred to "atomic operations" since both a key value and a data value or values are affected at the same time. Such tables have found wide use, such as for data unit address conversion in a variety of settings and data management in database applications.
Sequential processing of a unified data structure is possible, where a search begins at an upper or lower bound of the table and progresses until the desired key value is located or until the opposite bound is reached. The associated data value(s) is then retrieved if a match occurs. This approach is commonly referred to as a linear search.
Linear searching is impractical for large tables (i.e. tables having a relatively large number of entries) due to the length of time necessary to compare or "probe" a search value with the key value at each location. It is possible to suspend linear searching of an ordered table should the key values being compared go beyond the search value, indicating that the searched for value is not in the list. However, given an even chance of successfully finding any one of the key values, on average, half of the key values must be analyzed in performing a linear search.
To improve upon the performance of a linear search, ordered lists are commonly probed using a technique referred to as a binary search, a dichotomizing search in which the table to be searched is divided at each iteration into two equal, or nearly equal, parts. For instance, in an ordered list of twelve entries, a search for a particular search value begins at the sixth key entry. If the key value associated with the sixth key entry of the table is larger than the search value, the lower half of the table contains the searched for key value, if the key value is in the table at all. Otherwise, the key value is in the upper half of the table, if at all. Obviously, the search terminates if the value associated with the sixth key entry equals the search value. The divide by two and compare approach is repeated until the search value is matched to a key value, or it is established that there is no key value equivalent to the search value and the search is aborted.
Unified data structures such as in FIG. 1 are rarely static. Insertions in an ordered table can be performed by overwriting an existing set of key and data values with new values, or by executing a "shuffle up" operation. Deletions are performed by overwriting an existing set of key and data values with a new, dummy set of key and data values, or by executing a "shuffle down" operation. "Up" and "down" are relative directions based upon the ordering of the table.
The problem with inserting by overwriting lies in that it may be desired to keep the overwritten set of key and data values in the table. Thus, the overwritten pair must be rewritten in the next, higher set of entries. The process is repeated until a key/data value set is identified which does not need to be maintained in the table.
Alternatively, the shuffle up approach illustrated in FIG. 3 is employed for key and data value insertion. Here, it is desired to insert key value A457 and associated data value B321 into the first illustrated data structure. Key value FFFF may be regarded as a key value which is undesired, having any associated data value xxxx. In the second illustration, unified key and data values BCA2/0105 are copied into the next higher entries, formerly retaining "FFFF/xxxx". In the third illustration, unified key and data values A458/1005 are copied into the entries formerly occupied by BCA2/0105. Finally, the new values A457/B321 are written into the entries formerly for A458/1005. The proper insertion point is assumed to have been identified by a binary search such as described previously.
A shuffle down operation for the purpose of removing a key/data value pair from the unified data structure is illustrated in FIG. 4. Here, it is desired to remove key value A457 and associated data value B321. In essentially the reverse of the shuffle up process described with regard to FIG. 3, key/data values A458/1005 are overwritten into the entries formerly occupied by the values to be removed. BCA2/0105 take the place of A458/1005, and a new key/data value pair, FFFF/xxxx, is inserted in the place of BCA2/0105. FFFF/xxxx may be dummy values, or new values which were being held for an opportunity to be inserted into the table. The shuffle up and shuffle down operations maintain the integrity of the data in the table so that other searches can be performed during insert and delete operations.
The number of entries in the key table can be indicated by a "limit" value or by the key values stored in the table. If a limit value is not used, unused key entries at the "top" of the table should be filled with a value which is greater than the highest currently used key value to enable the binary search. The unused entries can all have the same, greater value, or can have increasing values. Either way, the binary search process will search for desired key values in the correct manner.
A limitation to the typical binary search approach is that it only works for a table such as illustrated in FIG. 1 comprised of contiguous, ordered key values. If a key value is not ordered with respect to its neighbors, a binary search may be skewed, leading to the failure of the search.
A further drawback associated with unified data tables such as in FIGS. 1, 3 and 4 lies in the amount of memory required if plural data entries are associated with each key entry, or if the data entry or entries are relatively long, or both. Larger unified memory implies higher cost. Further, while memory densities continue to increase with time, a practical limitation will always exist on the size of the data entry or entries for a given key entry size, particularly if ASIC based memory arrays are used.
One proposed resolution to the limitations associated with a unified data table is the provision of a distributed data structure, whereby key values are resident in key entries in a discrete memory element, and respective data entries are located in a separate, discrete memory element. Key and data entries are associated through the use of a pointer field stored in conjunction with each key entry.
Shuffle up and down processes are performed on such distributed data structures as well, though atomic updates are still required in order for the data structure to be searched contemporaneously with table updates. Otherwise, it would be possible for a data entry to temporarily have a value not properly associated with a respective key value, for the search process to identify that key value as matching a search value, and for an incorrect data value to be retrieved.
The requirement for atomic updates for a distributed data structure utilizing pointers to associate respective key and data entries introduces problems in synchronizing the key and data value updates. This typically results in the lowering of update speeds for such structures. Lower update speeds necessarily limit the speed at which search values can be processed using the distributed data structure since the search function must wait for the update function to avoid retrieving incorrect data values. In the context of data communications, such speed limitations result in the need to provide input buffering due to the delays at the distributed data structure. Such buffering means increased cost, space requirements, and design complexity, and introduces the possibility of data loss due to buffer overflow. Additional overhead is consumed in maintaining the key entry-associated pointers.
It is thus desired to have a distributed data structure to avoid the data entry size limitations associated with a unified data structure, and to have the distributed data structure capable of being searched and updated simultaneously without the need for buffering incoming data due to delays associated with the update processes.