The present invention generally relates to implementation of an associative memory, particularly to implementation of an associative memory based on a digital trie structure. The solution in accordance with the invention is intended for use primarily in connection with central memory databases, and it can be used in conjunction with all memories based on a digital trie structure.
The prior art unidimensional directory structure termed digital trie (the word xe2x80x9ctriexe2x80x9d is derived from the English word xe2x80x9cretrievalxe2x80x9d) is the underlying basis of the principle of the present invention. Digital tries can be implemented in two types: bucket tries, and tries having no buckets.
A digital bucket trie structure is a tree-shaped structure composed of two types of nodes: buckets and trie nodes. A bucket is a data structure containing a number of data units or a number of pointers to data units or a number of search key/pointer pairs (the number may include only one data unit, one pointer or one key/pointer pair). A trie node, on the other hand, is an array guiding the retrieval, having a size of two by the power of k (2k) elements. If an element in a trie node is in use, it refers either to a trie node at the next level in the directory tree or to a bucket. In other cases, the element is free (empty).
Search in the database proceeds by examining the search key (which in the case of a subscriber database in a mobile telephone network or a telephone exchange, for instance, is typically the binary numeral corresponding to the telephone number of the subscriber) k bits at a time. The bits to be searched are selected in such a way that at the root level of the structure (in the first trie node), k leftmost bits are searched; at the second level of the structure, k bits next to the leftmost bits are searched, etc. The bits to be searched are interpreted as an unsigned binary integer that is employed directly to index the element array contained in the trie node, the index indicating a given element in the array. If the element indicated by the index is free, the search will terminate as unsuccessful. If the element refers to a trie node at the next level, k next bits extracted from the search key are searched at that level in the manner described above. As a result of comparison, the routine branches off in the trie node either to a trie node at the next level or to a bucket. If the element refers to a bucket containing a key, the key stored therein is compared with the search key. The entire search key is thus compared only after the search has encountered a bucket. Where the keys are equal, the search is successful, and the desired data unit is obtained at the storage address indicated by the pointer of the bucket. Where the keys differ, the search terminates as unsuccessful.
A bucketless trie structure has no buckets, but reference to a data unit is effected from a trie node at the lowest level of a tree-shaped hierarchy, called a leaf node. Unlike buckets, the leaf nodes in a bucketless structure cannot contain data units but only pointers to data units. Also a bucket structure has leaf nodes, and hence trie nodes containing at least one pointer to a bucket (bucket structure) or to a data unit (bucketless structure) are leaf nodes. The other nodes in the trie are internal nodes. Trie nodes may thus be either internal nodes or leaf nodes. By means of buckets, the need for reorganizing the directory structure can be postponed, as a large number of pointers/data units can be accommodated in the buckets until a time when the need for reorganization arises.
The solution in accordance with the invention can be applied to a bucket structure as well as a bucketless structure. In the following, bucket structures will nevertheless be used as examples.
FIG. 1 illustrates an example of a digital trie structure in which the key has a length of 4 bits and k=2, and thus each trie node has 22=4 elements, and two bits extracted from the key are searched at each level. Buckets are denoted with references A, B, C, D . . . H . . . M, N, O and P. Thus a bucket is a node that does not point to a lower level in the tree. Trie nodes are denoted with references IN1 . . . IN5 and elements in the trie node with reference NE in FIG. 1.
In the exemplary case of FIG. 1, the search keys for the buckets shown are as follows: A=0000, B=0001, C=0010, . . . , H=0111, . . . and P=1111. In this case, a pointer is stored in each bucket to that storage location in the database SD at which the actual data, e.g. the telephone number of the pertinent subscriber and other information relating to that subscriber, is to be found. The actual subscriber data may be stored in the database for instance as a sequential file of the type shown in the figure. The search is performed on the basis of the search key of record H, for example, by first extracting from the search key the two leftmost bits (01) and interpreting them, which delivers the second element of node IN1, containing a pointer to node IN3 at the next level. At this level, the two next bits (11) are extracted from the search key, thus yielding the fourth element of that node, pointing to record H.
Instead of a pointer, a bucket may contain (besides a search key) an actual data file (also called by the more generic term data unit). Thus for example the data relating to subscriber A (FIG. 1) may be located in bucket A, the data relating to subscriber B in bucket B, etc. Thus in the first embodiment of an associative memory, a key-pointer pair is stored in the bucket, and in the second embodiment a key and actual data are stored, even though the key is not indispensable.
The search key may also be multidimensional. In other words, it may comprise a number of attributes (for example the family name and one or more forenames of a subscriber). Such a multidimensional trie structure is disclosed in international application No. PCT/FI95/00319 (published under number WO 95/34155). In said structure, address computation is performed in such a way that a given predetermined number of bits at a time is selected from each dimension independently of the other dimensions. Hence, a fixed limit independent of the other dimensions is set for each dimension in any individual node of the trie structure, by predetermining the number of search key bits to be searched in each dimension. With such a structure, the memory circuit requirement can be curbed when the distribution of the values of the search keys is known in advance, in which case the structure can be implemented in a static form.
If the possibility of reorganizing the structure in accordance with the current key distribution to be optimal in terms of efficiency and storage space occupancy is desired, the size of the nodes must vary dynamically as the key distribution changes. When the key distribution is uniform, the node size may be increased to make the structure flatter. On the other hand, with non-uniform key distributions in connection with which storage space occupancy will present a problem in memory structures employing dynamic node size, the node size can be maintained small, which will enable locally a more uniform key distribution and thereby smaller storage space occupancy. Dynamic changes in node size presuppose implementation of address computation in such a way that in each node of the tree-shaped hierarchy constituted by the digital trie structure, a node-specific number of bits is selected from the bit string constituted by the search keys employed.
The choice between a fixed node size and a dynamically changing node size is dependent for example on for what kind of application the memory is intended, for example what the number of retrievals, insertions and deletions to be made in the database is and what the proportions of these operations are.
Irrespective of whether a fixed or changing node size is used in the memory, memories based on the digital trie structure are nevertheless attended by the problem of how the empty space inevitably created in the structure can be modelled in such a way that storage space occupancy will be as low as possible and memory efficiency (speed of memory operations) as good as possible.
It is an objective of the present invention to provide a solution to the above problem. This objective is achieved with the method defined in the independent claims. The first of these discloses a structure employing buckets and the second a structure not employing buckets.
The basic idea of the invention is to compress such nodes in a digital trie structure that provide only a single path downward in a tree-shaped hierarchy. The data needed to proceed in the structure and for reorganization of nodes is stored in such a compressed node, without any storage space being required for (an) element array(s).
On account of the solution of the invention, the empty space present in the trie structure can be modelled in such a way that storage space occupancy in the structure will remain small with uniform as well as non-uniform key distributions. Furthermore, the solution enables the number of memory references requiring computation time to be minimized, thus making the efficiency (speed) of the memory as good as possible.
In accordance with a preferred embodiment of the invention, each chain made up by successive compressed nodes is replaced with a single collecting node. This enables elimination of chains made up by successive compressed nodes as a result of limited word length. Elimination of chains will further improve memory efficiency and curb the need for storage space.
The solution in accordance with the invention also ensures effective performance of set operations, as the structure is an order-preserving digital trie.