Tree searching and manipulation are widely used in information systems and in many data processing applications such as hierarchical data manipulation (e.g. machine vision, Geographic Information Systems (GIS) maps, DNA trees, and databases), indexing and hashing (e.g. search engines), and others. As used herein, the term “manipulation” is intended to refer to traversing a tree, passing by all nodes of that tree for data processing purposes, such as a search or an update of values to ensure consistency to some criteria. A well known example of tree manipulation is CBIR systems (acronym for “Content Based Image Retrieval”) that require the algorithm to pass by approximately all tree data nodes of a very large dataset, which may probably be several millions of data nodes.
Tree data structures are typically represented in the form of multi-branched linked lists, representing links between parents and corresponding children in form of pointers/references. With such tree data structures, sub-trees appear to not be contiguous in terms of memory allocation. Trees are generally categorized as random access structures in memory access time perspective.
Conventional tree search and manipulation systems have limited memory latency because data nodes are not organized in memory in specific sequences, which makes it difficult to benefit from cache systems (due to high miss rate). This conventional tree search and manipulation process becomes worse for multi-core processors such as Cell Broadband Engine (Cell/B.E.) (Cell Broadband Engine and Cell/B.E. are trademarks of Sony Computer Entertainment, Inc., in the United States) because the Synergistic Processing Elements (SPE) have a very limited amount of Local Store space and manually controlled bandwidth, which is handled by Direct Memory Access (DMA) calls. For parallel distributed machines such as computer clusters, network bandwidth is limited and manually controlled, and transfer latency is higher.
Conventional tree data structures are accordingly not suitable for use on multi-core processors provided with software managed memory hierarchy such as Cell/B.E, where the Synergistic Processing Elements (SPE) depend on data available at their Local Stores (LS). For the same reasons, conventional tree data structures are not adapted for computer clusters with a network-connected distributed memory hierarchy.
Existing multi-core processor systems and computer clusters systems interact with their memories by some specific system calls to transfer data chunks between the system cores or nodes. The more efficient these transfers are, the more performance these systems gain.
Solutions for tree searching and manipulation are known in the domain of “parallel algorithms for tree search”, in particular for game searching algorithms applications. Such solutions rely on Artificial Intelligence techniques such as “min/max” or “alpha/beta” approaches, where trees are generated in parallel. For example, in the article entitled “Efficient implementations of search trees on parallel distributed memory architectures”, Computers and Digital Techniques, IEEE Proceedings, Colbrook A. and Smythe C., and in the article entitled “Asynchronous parallel game-tree search”, Journal of Parallel and Distributed Computing (1994), Mark Gordon Brockington, T. Anthony Marsland, John Samson, and Murray Campbell, there is provided a set of computing units, where each computing unit generates its own tree, searches the tree, and returns some results. However, these solutions are not adapted for searching a pre-existing (i.e. resident in memory) tree data structure.
Other solutions provide conventional tree data structures that are distributed on different processing nodes, such as the one described in the article entitled “A scalable distributed parallel breadth-first search algorithm on BlueGene/L”, Proceedings of the ACM/IEEE Supercomputing 2005 Conference, 25-35, Yoo A., Chow E., Henderson K. and McLendon W., 2005. In these solutions, data are already distributed on different memories of different computer systems. These solutions are therefore not suitable for data that are resident in the memory of a processor multi-core chip or in the memory of a master computer in a master/slave environment. Such resident data need to be distributed on several cores or computer systems to be processed efficiently. A known solution for resident data is described in the article entitled “Programming the Cell Broadband Engine Architecture: Examples and Best Practices”, IBM® Redbooks® publication. This solution, specific to Cell Broadband Engine (Cell/B.E) multi-core processors, uses a Software Cache to tighten the Memory Latency gap. However, Software Caches have low performance and still makes the Cell/B.E multi-core processor not efficient when dealing with tree data structures.
In another solution described in the article “Software and Algorithms for Graph Queries on Multithreaded Architectures”, Proc. IEEE Workshop on Multithreaded Architectures and Applications, 2007, Jonathan Berry, Bruce Hendrickson, Simon Kahan and Petr Konecny, the graph data structure as dominated by memory latency is identified. The solution provides a framework for handling the graph structure, but only for shared memory architectures. However, it is not adapted to distributed memory architectures. A similar solution exists for a CBIR (Content-Based Image Retrieval), but this solution is also limited to shared memory architecture.
The present invention overcomes the problem of conventional solutions as will be described in greater detail below.