In modern microprocessor systems, the speed of the main memory tends to be substantially slower than the speed of the processor core. A typical DRAM main store coupled to a high-frequency microprocessor takes several hundred processor cycles to access. In the future, problems resulting from the mismatching of memory speed versus processor speed will, in all likelihood, become ever more acute.
One major cause of these problems is memory access latency. For example, the time between the issue of a LOAD instruction to main memory and the actual transfer of a first Word from a main memory is usually very long, and can impose many stall cycles on the processor core. However, once the first word has been transmitted, consecutive words can be transferred quickly. The quick transference of consecutive words is generally referred to as the “burst-mode effect.”
Typically, a microprocessor system employs a local store, such as a cache, to take advantage of the burst-mode effect. This occurs by transferring a whole cache line (that is, the minimum number of bytes that is to be loaded when a local store or cache data is replaced) from main memory and storing the whole cache line in the local store, instead of just transferring the smaller words that are requested directly from the main memory.
If the likely data to be read in the near future has a sufficient amount of spatial locality (that is, data stored in a sufficiently substantially contiguous area in the main memory) with the data now requested, and is therefore also stored in the local store, memory efficiency is improved. This is typically because the memory information that is likely to be needed is already stored in the faster cache, thereby reducing memory access times. The same effect can be achieved if the microprocessor system features a memory architecture, whereby the microprocessor system transfers the memory blocks from the main memory to a local store. The local store is comparable to a cache, such as a software-managed cache. The local storage can be a function of memory burst access size, bus transfer size and cache line size.
In conventional technology, methods exist to implement tree searches within a cache, by the cache hardware or software managed. During a search of a decision tree, such as a binary tree, after reaching a decision node, a subset of the appropriate tree nodes are accessed as the tree is traversed. This process continues until reaching the appropriate leaf node which contains the desired data. Only a few bytes are read during each tree node access.
Conventional tree search implementations use indirect pointers in each tree node to reference the parent and child nodes, and tree nodes are usually distributed across the whole address space. These approaches have at least two major drawbacks.
The first drawback of existing tree search implementations is that since the nodes, both decision and leaf, are typically distributed across the whole address space, multiple memory accesses to random memory locations need to be performed. Spatial locality is low or nonexistent, which leads to long waiting times for the microprocessor to wait upon memory access because the needed information is stored in noncontiguous areas. The second disadvantage is that the indirect pointers within a tree node consume memory.
Therefore, there is a need for employing tree nodes with memory accessing in a cache that overcomes the shortcomings of existing memory accessing methods.