The present invention generally relates to implementation of a memory. More specifically, the invention is intended to be used in connection with memories which are based on a digital trie structure and in which width-compressed nodes are used. The solution is mainly intended for central memory databases and is suited for both conventional overwriting memories and functional memories. The former denotes a memory in which updates are made directly on the existing data and the latter, on the other hand, a memory in which the path from the root of the structure to the point of addition is first copied, and the addition is thereafter made to the copied data (the addition is not made directly to the existing data). The former updating procedure is also called by the term xe2x80x9cupdate-in-placexe2x80x9d and the latter by the term xe2x80x9ccopy-on-writexe2x80x9d.
The prior art unidimensional directory structure termed digital trie (the word xe2x80x9ctriexe2x80x9d is derived from the English word xe2x80x9cretrievalxe2x80x9d) is the underlying basis of the principle of the present invention. Digital trie structures can be implemented in two types: bucket tries, and tries having no buckets.
A bucket digital trie structure is a tree-shaped structure composed of two types of nodes: buckets and trie nodes. A bucket is a data structure that can accommodate a number of data units or a number of pointers to data units or a number of search key/data unit pairs or a number of search key/pointer pairs. A maximum size greater than one has been defined for said number. However, a bucket can contain a smaller number than said maximum number of data units, pointers, or key/pointer pairs, in which case the bucket is not full. A trie node, on the other hand, is an array guiding the retrieval, having a size of two by the power of k (2k) elements. If an element in a trie node is in use, it refers either to a trie node at the next level in the directory tree or to a bucket. In other cases, the element is free (empty).
Search in the database proceeds by examining the search key (which in the case of a subscriber database in a mobile telephone network or a telephone exchange, for instance, is typically the binary numeral corresponding to the telephone number of the subscriber) k bits at a time. The bits to be searched are selected in such a way that at the root level of the structure (in the first trie node), k leftmost bits are searched; at the second level of the structure, k bits next to the leftmost bits are searched, etc. The bits to be searched are interpreted as an unsigned binary integer that is employed directly to index the element table contained in the trie node, the index indicating a given element in the table. If the element indicated by the index is free, the search will terminate as unsuccessful. If the element refers to a trie node at the next level, k next bits extracted from the search key are searched at that level in the manner described above. As a result of comparison, the routine branches off in the trie node either to a trie node at the next level or to a bucket. If the element refers to a bucket containing a key, the key stored therein is compared with the search key. The entire search key is thus compared only after the search has encountered a bucket. Where the keys are equal, the search is successful, and the desired data unit is obtained at the storage address indicated by the pointer of the bucket. Where the keys differ, the search terminates as unsuccessful.
A bucketless trie structure has no buckets, but a leaf node containing only one element that can be a data unit, a pointer to a data unit, a search key/data unit pair or a search key/pointer pair corresponds to a bucket. In the present context, the nodes above the leaf nodes in the bucketless trie structure are called internal nodes; these correspond to trie nodes in a bucket structure (i.e., they comprise a similar table as trie nodes). In a bucketless digital trie structure, the nodes are thus either internal nodes or leaf nodes. By means of buckets, the need for reorganizing the directory structure can be postponed, as a large number of pointers, data units, search key/data unit pairs or search key/pointer pairs can be accommodated in the buckets until a time when such a need arises.
FIG. 1 shows an example of a digital trie structure in which the key has a length of 4 bits and k=2, and thus each trie node has 22=4 elements, and two bits extracted from the key are searched at each level. Leaves are denoted with references A, B, C, D . . . H . . . M, N, O and P. Thus a leaf is a node that does not point to a lower level in the tree. Internal nodes are denoted with references IN1 . . . IN5 and elements in the internal node with reference NE in FIG. 1.
In the exemplary case of FIG. 1, the search keys for the leaves shown are as follows: A=0000, B=0001, C=0010, . . . , H=0111, . . . and P=1111. In this case, a pointer is stored in each bucket to that storage location in the database SD at which the actual data, e.g. the telephone number of the pertinent subscriber and other information relating to that subscriber, is to be found. The actual subscriber data may be stored in the database for instance as a sequential file of the type shown in the figure. The search is performed on the basis of the search key of record H, for example, by first extracting from the search key the two leftmost bits (01) and interpreting them, which delivers the second element of node IN1, containing a pointer to node IN3 at the next level. At this level, the two next bits (11) are extracted from the search key, thus yielding the fourth element of that node, pointing to record H.
Instead of a pointer, a leaf may contain (besides a search key) an actual data file (also called by the more generic term data unit). Thus for example the data relating to subscriber A (FIG. 1) may be located in leaf A, the data relating to subscriber B in leaf B, etc.
The search key may also be multidimensional. In other words, the search key may comprise a number of attributes (for example the family name and one or more forenames of a subscriber). Such a multidimensional trie structure is disclosed in international application No. PCT/FI95/00319 (published under number WO 95/34155). In said structure, address computation is performed in such a way that a given predetermined number of bits at a time is selected from each dimension independently of the other dimensions. Hence, a fixed limit independent of the other dimensions is set for each dimension in any individual node of the trie structure, by predetermining the number of search key bits to be searched in each dimension. With such a structure, the memory circuit requirement can be curbed when the distribution of the values of the search keys is known in advance, in which case the structure can be implemented in a static form.
If the possibility of reorganizing the structure in accordance with the current key distribution to be optimal in terms of efficiency and storage space occupancy is desired, the size of the nodes must vary dynamically as the key distribution changes. When the key distribution is even, the node size may be increased to make the structure flatter (a flatter structure entails faster retrievals). On the other hand, with uneven key distributions in connection with which storage space occupancy will present a problem in memory structures employing dynamic node size, the node size can be maintained small, which will enable locally a more even key distribution and thereby smaller storage space occupancy. Dynamic changes of node size require the address computation to be implemented in such a way that in each node of the tree-shaped hierarchy constituted by the digital trie structure, a node-specific number of bits is selected from the bit string constituted by the search keys used. Dynamic reorganizing of the nodes naturally requires part of the processing capacity.
The choice between a fixed node size and a dynamically changing node size is dependent for example on what type of application the memory is intended for, e.g. what the number of database searches, additions and deletions is and what the proportions of said operations are.
The efficiency and performance of the memory are thus influenced, among other things, by the storage space required by the trie structure and the depth of the trie structure. Both of these can be influenced by performing width compression in the nodes. Width compression means that the size (width) of the node is diminished by physically storing in the node only those pointers whose value deviates from zero. This will be described in brief in the following.
FIG. 2 illustrates a non-compressed node N20 having a (logical) element table of 16 elements. In this exemplary case, the node has, in addition to twelve nil pointers, four non-nil pointers (A . . . D) pointing downward in the tree, which in this case are located in elements corresponding to element table indices 1, 7, 8 and 13. Width compression is carried out by storing only those pointers that differ from nil. In addition to the non-nil pointers, a bit pattern or chart BP1 is stored in connection with the node, on the basis of which it can be determined whether the pointer corresponding to the logical index of the element table of the node is a nil pointer or not, and if not, where the pointer corresponding to said logical index is physically located in the node. When compression is used, the fixed length element table (16 elements) of the node is represented by means of the bit pattern as a table of physical storage locations the length of which varies according to how many nil pointers the node contains in each case. It is to be noted, therefore, that in connection with width compression the logical size of the node (i.e. the size of the element table) does not change, but the physical size of the node diminishes instead, since in a compressed node the nil pointers do not occupy any storage space. As a result, a width-compressed node N30 in accordance with FIG. 3, in which all non-nil pointers are in succession, is obtained from the node of FIG. 2. The node contains only four physical elements (pointers A . . . D), and in addition a bit pattern BP1 is stored in the node, indicating the physical location of the pointer therein corresponding to the element table index formed from the search key. The bit pattern has one bit for each element (logical index) of the element table, and each bit indicates whether the corresponding element contains a non-nil pointer or a nil pointer. In the exemplary case shown in the figure, one denotes a non-nil pointer and zero denotes a nil pointer. Since the pointers are stored in the compressed node preserving the order (and no space is reserved for nil pointers), it is known for the compressed node of FIG. 3 that a nil pointer corresponds to element table index 0, a non-nil pointer corresponds to element table index 1, its physical index being zero, nil pointers correspond to element table indices 2 . . . 6, non-nil pointers correspond to element table indices 7 and 8, their physical indices being one and two, nil pointers correspond to element table indices 9 . . . 12, a non-nil pointer corresponds to element table index 13, its physical index being three, and nil pointers correspond to element table indices 14 and 15. Thus, the pointer corresponding to the logical index formed from the search key bits is found in the node.
Address computation for the compressed node is performed in such a way that a (logical) element table index is first formed from the bits extracted from the search key in the normal way. Thereafter, the bit corresponding to this index is normally read from the bit pattern. If the bit indicates that a nil pointer is concerned, the search is terminated as unsuccessful. If the bit indicates that a non-nil pointer is concerned, the physical location (physical index) of the pointer corresponding to said element table index is determined by means of the bit pattern.
Width compression of the kind described above, in which the bit pattern of the compressed node has one bit for each element (logical index) of the element table, is known per se. Such a solution is referred to for example in U.S. Pat. No. 5,276,868. As is stated in this U.S. Patent, the physical index has previously been determined from the bit pattern by two different methods. In the first method, the physical index is directly obtained by counting the number of 1-bits starting from the beginning of the bit pattern up to the bit corresponding to the element table index. In the second method, the bit pattern is combined with a four-bit search character and this code is used as an index of an existing search table. The drawback of the former method is its slowness, since the search of the physical index can require as many as 16 discrete calculating operations. The drawback of the latter method, on the other hand, is the high storage space consumption. The U.S. Patent referred to discloses a method whose aim is to provide a rapid conversion from a logical index to a physical index. This system supports only two node sizes: quad nodes and nodes of size 16. In the method, some of the nil pointers are removed from the nodes in such a way that nodes having less than five non-nil pointers are converted to compressed nodes, whereas the remaining nodes are maintained uncompressed. The structure thus includes both compressed nodes and non-compressed nodes of size 16. The compressed nodes can be of fifteen different types according to the logical indices of the pointers. Dedicated conversion means wherewith the physical index is found are associated with each of said types. The drawback of the structure is, among other things, the fact that it only supports two node sizes and the fact that a large number of nil pointers must still be stored in the structure. On account of this, the structure does not provide a very good result in view of storage space requirement and copying costs. Also, since only two node sizes are possible, the structure is not well suited to functional memories or other memories having efficient memory management that is capable of allocating memory for use a single word at a time.
It is an object of the present invention to offer an improvement to the problem described above by providing a method ensuring rapid conversion from a logical index to a physical index and furthermore enabling width compression of all nodes. This object is achieved with the method defined in the independent claims.
The invention makes use of a bit pattern of the kind described above, in which the number of 1-bits (or zero bits, wherein a non-nil pointer is indicated by zero bits) from the beginning of the bit pattern up to the bit corresponding to the element table index indicates the physical index. The idea is to divide the calculation of 1-bits in the bit pattern into parts, preferably two parts, which will allow a very small search table to be used. The calculation is divided into parts by setting all those bits in the bit pattern to zero whose index in the bit pattern is greater than the logical index formed from the search key, and by reading the number of ones in the bit pattern in one or more steps from the search table, in which numbers of 1-bits in all different bit combinations of a word whose total number of bits is a predetermined portion of the number of bits in said bit pattern have been calculated in advance. The length of said word is preferably half of the length of the bit pattern (16), in which case the table retrieval is performed once if the logical index (starting from zero) is smaller than the length of the word, or twice if the logical index is greater than this.
The search table necessary in the solution in accordance with the invention is in practice so small that it can be located in the cache of the processor, which will afford a rapid search.
Besides enabling a rapid conversion, the solution in accordance with the invention supports width-compressed nodes of all sizes, thus allowing all nodes in the structure to be width-compressed. The same rapid conversion method can be used for all nodes. Furthermore, the solution is suitable for both conventional overwriting structures and functional structures.