A typical database/data set is an organized collection of data for one or more uses, in digital form. In recent years databases have become essential ingredients in almost every system. Digital databases are managed using a Database Management System (DBMS). The DBMS is a software which operates a database by providing retrieval (query) services, manipulation and update services, management and administration services, storage, access, security, backup and other features. Typically, a DBMS is categorized based on the database model it supports, such as relational or eXtensible Markup Language (XML), the type of computer it supports, such as a server cluster or a mobile phone, the storage sets it uses, the memory type it uses, such as in-memory databases flash memory databases, the query language that accesses the database, such as Structured Query Language (SQL) or XQuery or No-SQL query protocols or other data query protocols. Several known DBMSs cover more than one entry in these categories, commonly, supporting multiple query languages.
A Query language is a high-level language which is specifically designed to handle the interactions with a database. The database queries are constructed using a query language, and are then used to access and get necessary information from the database. More specifically, a database query is a piece of code which is sent to a Central Processing Unit (CPU) for the purpose of retrieving information from the database. In other terms, a database “query” is a standard database “question” asked from the DBMS by a third party, namely, application that uses the DBMS. The result of the query is the retrieved information which is returned by the DBMS.
In recent years, companies invest a huge amount of time and money in order to improve their software and hardware performances, particularly in the manner by which they efficiently manage their databases. Performance improvement is the concept of modifying a particular process or system to increase its efficiency. In DBMS, performance improvement reduces the query time, and therefore, anticipated results are received faster. For many commercial applications (for example, Algo-Trading, Complex Event Processing (CEP), Reporting and BI (Business Intelligence) etc.) receiving query results fast is a crucial factor in their operability. Generally, a 50 to 100 percent improvement in the performance is considered a very significant improvement.
Parallel processing is the ability to carry out multiple operations or tasks simultaneously. One of the most common ways to achieve performance improvement is to employ parallel processing while executing a query. Parallel queries allow breaking a single query into multiple subtasks and executing them on multiple general-purpose Central Processing Units (CPUs) in a multiprocessors computer. While such manner of operation increases speed it requires more resources, suffers excessive power consumption, and occupies a large storage space.
Parallel processing of queries is common in the art, and such parallel processing is obtained by the handling of queries by several CPUs, typically, each having multiple cores (core is an instruction processing unit).
Moore's law describes a long-term trend in the history of computing hardware. The number of transistors that can be placed inexpensively on an Integrated Circuit (IC) is doubled approximately every two years. The capabilities of many digital electronic devices are strongly linked to Moore's law, e.g., processing speed, memory capacity, sensors and even the number and size of pixels in digital cameras. All of these are improving at (roughly) exponential rates as well. The hardware limitation accurately described by Moor's law dictates a limitation over the number of CPUs that may participate in a parallel query. Parallel queries utilizing CPUs are also limited by the relatively low number of cores in each CPU.
A Graphics Processing Unit (GPU), also known as a Visual Processing Unit (VPU) is a specialized microprocessor that offloads and accelerates 3D or 2D graphics rendering from the microprocessor (i.e., CPU). More specifically, the GPU has been particularly designed to process graphics and streams for visual aspects (such as movies). It is used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs which comprise hundreds of cores that operate in parallel are very efficient in manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. More than 90% of new desktop and notebook computers have integrated GPUs. However, GPUs are usually designed to operate with a unique and limited input structure known as “stream”, but they can also operate in other common paradigms. Stream processing typically allows high throughput and high latency processing. The GPU is typically connected through a PCI bus or its derivatives, for example, a PCIe bus. Still, there are GPUs that are provided within the CPU die (typically call on-die GPU), such as AMD Fusion, or Intel GMA. Furthermore, Intel recently introduce the KNIGHTS-FERRY/CORNER, which is a processor based on MIC (Many Integrated Cores) architecture, and which is connected to a PCI bus. Although the KNIGHTS-FERRY/CORNER is specifically designed for stream processing, still it contains hundreds of cores, and it is capable of handling large amount of simultaneously running threads. Hereinafter, the term MCP (Multi Core Processor) refers to all the above versions of GPUs, and to other multi core processing units that are connected to the PCI bus, or multiple of such MCPs distributed on the network and connected by MPI (Message Passing Interface) or any other protocol that connects machines on the network.
Stream processing is a computer programming paradigm, which allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the floating point units on a GPU, or the Field Programmable Gate Arrays (FPGAs), without explicitly managing allocation, synchronization, or communication among those units. The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation which is performed. Given a set of data having a stream format, a series of operations is applied to each element in the stream.
In graphical operations requiring massive vector operations, stream processing yields several orders of magnitude faster performance than a general-purpose CPU. Stream processing is essentially a compromise, driven by a data-centric model that works very well for traditional GPU-type applications (such as image, video and digital signal processing) but less so for general-purpose CPUs with more randomized data access (such as databases). By sacrificing flexibility, GPUs allow easier, faster and more efficient execution. As noted, a typical GPU comprises several hundreds of cores, and therefore the parallel operation of the GPU is significantly accelerated for those applications that are adapted to operate in a stream structure. Unfortunately, thus far there are many applications that are not adapted for operation on GPU, and one of these applications is the processing of database queries. Therefore, thus far database queries can operate in parallel only with CPUs, which as said employs at most very few cores in parallel. Database query parallel execution which uses a GPU could not be performed thus far, and as a result, obviously the advantages of using hundreds of cores in parallel could not be obtained.
There have been tries by the art to provide SQL like data processing on GPUs. The main drawbacks of all said previous tries are (a) they require a separate off-line data storage, in addition to the main data storage, such as they require a separate copy of the data stored in the MSSQL Server. This is a very significant drawback. The maintaining of two separate databases very disadvantageous in terms of speed, because the synchronization between the two databases imposes very sever CPU and network usage, and typically during such a synchronization, the DBMS is prevented from serving the users. More specifically, such an operation prevents working on real time data (OLAP—On Line Analytical Processing). The advantage of the present invention approach will become clear in the explanation relating to the virtual machine and compiler actions; (b) the lack of non intrusiveness operation; (c) the lack of non invasiveness operation; (d) the proposed algorithms do not cover all standard SQL operators;
In the context of the present invention, the term “non intrusive” means that the system of the present invention does not require changes in the conventional third party applications, specifically, application code, application architecture, application database structure, methodology of writing the application code and design, etc. The term “automatic” means that the system automatically adapts its operation to the architecture and database of the third party application, architecture and type of the server database, and typically there is no need to manually change the technology code to do it.
The following prior art publications describe such prior art tries to perform SQL primitives on a GPU. All said publications suffer from the abovementioned drawbacks:    1. Bakkum et al, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU-3 Mar. 14, 2010, Pittsburg, Pa., USA;    2. Tsakalozos et al, Using the Graphics Processor Unit to RealizeData Streaming Operations, MDS'09, Nov. 30, 2009 Urbana Champaign, Ill., USA;    3. Volk et al, GPUBased Speculative Query Processing for Database Operations, (http://www.vldb.org/archives/workshop/2010/proceedings/files/vldb 2010_workshop/ADMS 2010/adms10-volk.pdf).
Engineers continue to search for new ways to design CPUs that settle a little quicker or use slightly less energy per transition, pushing back those limits, producing new CPUs that can run at slightly higher clock rates. However, recent years improvements, hardly satisfies the speed requirements for receiving results from queries over large databases.
Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels (functions that execute on OpenCL devices), plus APIs that are used to define and control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. OpenCL enables any application to access a GPU even for a non-graphical computing. Thus, OpenCL extends the power of the GPU beyond graphics, enabling general-purpose computing on a GPU having plurality of cores. For the sake of brevity, the application refers herein to an OpenCL. The term OpenCL, when refers herein relates to the OpenCL or Cuda or Microsoft C++AMP language and their derivatives, for example, libraries like Thrust, Cudapp, Oclpp, etc. Moreover, the term “OpenCL” refers herein to any framework or programming language for working with parallel computational hardware connected through a PCI bus (like NVIDIA/ATI GPUs or Intel Knights Ferry/Knights Corner or Intel's next generations. Therefore, hereinafter, for the sake of brevity reference in the description is made to “OpenCL”. However, this term in this description refers to any platform which is capable of carrying out processing over both one or more CPUs and one or more MCPs, for example, Nvidia's CUDA, Microsoft's C++AMP and their derivatives: Thrust, oclpp, cudapp, etc.
It is therefore an object of the present invention to provide a system which is adapted to query a dataset, by simultaneously utilizing a central processing unit (CPU)—which typically applies several cores in parallel, and a Multi Core Processor (MCP)—which typically applies hundreds of cores in parallel.
It is another object of the present invention to boost performance in large queries for databases and for data sets that are not residing on DBMS, such as any type/format of SGML/XML object, any type/format of text data objects, propriety and binary data objects, No-SQL and alike data sets, Search engines like Lucene.
It is still another object of the present invention to provide a system in which the throughput of DBMS is significantly increased in terms of queries per hour and/or per specific query
It is another object of the present invention to provide a system for querying a database which is an add-on, and low cost, and which is based on a heterogeneous platform (CPU plus any MCP hardware connected to PCI bus).
It is another object of the present invention to increase the efficiency of querying a database in terms of time, power, volume, and hardware resources allocation.
It is still another object of the present invention to provide the above system, which may be formed as a non-intrusive and non-invasive, that does not require changing the way that a database consuming application was written, is written or will be written.
It is still another object of the present invention to provide a database querying system which operates in parallel with both one or more CPUs, and one or more MCPs.
Other objects and advantages of the invention will become apparent as the description proceeds.