Today's database systems have to execute many queries on large databases, e.g., Google has to answer about one billion search queries per day. To keep the response time low, queries must be executed as fast as possible. Queries are the interface between the database management systems, which hold the databases, and the database applications, which rely on data stored in such databases. Therefore, speeding up query execution increases the performance of database systems, and the database applications would benefit from a lowered response time.
Dedicated and reconfigurable hardware accelerators have a long history of improving database queries, especially the throughput, latency, and power requirements of computations as compared to conventional software-based processors. At the same time though, the dedicated nature of these types of devices also generally means that they can perform only relatively simple functionality, requiring software-based processing to complete a task, or at least act as backup processing. This distributed processing can lead to synchronization and/or consistency issues between the two systems.
Field programmable gate arrays (FPGAs) and other hardware logic devices (e.g., application-specific integrated circuits (ASICs) and complex programmable gate arrays (CPGAs) have been used to accelerate the storage and traversal of tree data structures. Current solutions take one of two general approaches: caching by means of a content-addressable memory structure and FPGA accelerators for trees, where tree nodes are laid out in dedicated register-transfer level (RTL) logic. The first type of solution has two drawbacks. First, it is usually characterized by an explicit caching policy, where the host (e.g., database management system (DBMS)) must make decisions regarding storing/updating items in a cache. Second, the data structures most often used do not provide very high memory efficiency. The second type of solution also has two primary drawbacks. First, because entire trees are directly represented in the hardware, they consume large amounts of on-chip memory/logic. Second, the tree structure that is implemented is very rigid, making dynamic updates to the tree structure difficult to handle quickly and economically.