Processing system designers continually seek new ways to improve device performance. While processing speeds continue to increase, the latency imposed by memory access times imposes operating delays. In systems-on-a-chip/embedded systems, efforts to avoid such latency issues have included utilizing local memory in the form of SRAM (static random access memory) on-chip. However, cost and size limitations reduce the effectiveness of the use of SRAM on-chip for some processing environments.
For example, currently in network environments, network switches are being used to perform more complex operations than simple packet forwarding. Network processors are being developed to provide for more complex processing in network routers, while maintaining flexibility to accommodate changes and enhancements to the functionality provided by the routers, as techniques and protocols evolve. As with most any form of processors, these network processors also face challenges in terms of memory utilization, particularly due to the need to handle a vast array of network traffic.
In embedded processing systems, such as network processors, off-chip/external DRAM (dynamic random access memory) is an option that is often chosen due to its lower cost, as compared with SRAM. Thus, while potentially most cost effective, the use of external DRAM introduces a performance penalty in the form of longer access latency (additional delay cycles for the first request for data) relative to other types of RAM. Further, the problem of longer access latency is felt more sharply with shared DRAM, which needs to support concurrent operations required by the system, such as reading in new data from a DMU (data management unit) at the same time that a search for data in the memory is being performed.
In order to facilitate quicker storage and retrieval of data from the DRAM, a tree structure often is employed for the data being stored. For example, a typical tree structure may be from 12 levels to more than 23 levels deep. Such a large number of levels requires multiple requests to memory to obtain all of the necessary data, i.e., to access and utilize the desired leaf of the tree. In addition, with each successive level of the tree, there is more data (unsearched) than the previous level. These factors create further issues regarding how quickly traversal of a tree structure can occur.
Accordingly, what is needed is a system and method for optimization of a tree structure for data stored in external DRAM of an embedded processing system. The present invention addresses such a need.