Despite their different uses, applications, and workload characteristics, most systems run on a common Database Management System (DBMS) using a standard database programming language, such as Structured Query Language (SQL). Most modern DBMS implementations (Oracle, IBM DB2, Microsoft SQL, Sybase, MySQL, PostgreSQL, Ingress, etc.) are implemented on relational databases, which are well known to those skilled in the art.
Typically, a DBMS has a client side where applications or users submit their queries and a server side that executes the queries. On the server side, most enterprises employ one or more general purpose servers. However, although these platforms are flexible, general purpose servers are not optimized for many enterprise database applications. In a general purpose database server, all SQL queries and transactions are eventually mapped to low level software instructions called assembly instructions, which are then executed on a general purpose microprocessor (CPU). The CPU executes the instructions, and its logic is busy as long as the operand data are available, either in the register file or on-chip cache. To extract more parallelism from the assembly code and keep the CPU pipeline busy, known CPUs attempt to predict ahead the outcome of branch instructions and execute down the code path speculatively. Execution time is reduced if the speculation is correct; the success of this speculation, however, is data dependent. Other state-of-the-art CPUs attempt to increase performance by employing simultaneous multithreading (SMT) and/or multi-core chip multiprocessing (CMP). To take advantage of these, changes have to be made at the application or DBMS source code to manually create the process/thread parallelism for the SMT or CMP CPUs. This is generally considered highly as very complex to implement and not always applicable to general purpose CPUs because it is workload dependent.
Unfortunately, general purpose CPUs are not efficient for database applications. Branch prediction is generally not accurate because database processing involves tree traversing and link list or pointer chasing that is very data dependent. Known CPUs employ the well known instruction-flow (or Von Neumann) architecture, which uses a highly pipelined instruction flow (rather than a data-flow where operand data is pipelined) to operate on data stored in the CPUs tiny register files. Real database workloads, however, typically require processing Gigabytes to Terabytes of data, which overwhelms these tiny registers with loads and reloads. On-chip cache of a general purpose CPU is not effective since it's relatively too small for real database workloads. This requires that the database server frequently retrieve data from its relatively small memory or long latency disk storage. Accordingly, known database servers rely heavily on squeezing the utilization of their small system memory size and disk input/output (I/O) bandwidth. Those skilled in the art recognize that these bottlenecks between storage I/O, the CPU, and memory are very significant performance factors.
However, overcoming these bottlenecks is a complex task because typical database systems consist of several layers of hardware, software, etc., that influence the overall performance of the system. These layers comprise, for example, the application software, the DBMS software, operating system (OS), server processor systems, such as its CPU, memory, and disk I/O and infrastructure. Traditionally, performance has been optimized in a database system “horizontally,” i.e., within a particular layer. For example, many solutions attempt to optimize various solutions for the DBMS query processing, caching, the disk I/O, etc. These solutions employ a generic, narrow approach that still fails to truly optimize the large performance potentials of the database system, especially for relational database systems having complex read-intensive applications.
Accordingly, it would be very desirable to provide a more complete solution for database systems than what is currently available. As will be described herein, the present invention provides a relatively complete solution that utilizes hardware acceleration for query processing and a “vertical” solution approach.