When a large number of queries are sent to a database management system (DBMS) simultaneously, a database management system uses a concurrency control to process each query. The search only transactions are called queries and each query is processed separately. Therefore query processing in a conventional database management system has the following problems for processing a large number of queries.
Problem (1):
For processing queries, data of a database is accessed, and when a large number of queries perform operations on a common relation of the database, same data is accessed from disk and loaded into main memory to be processed repeatedly. Therefore it is not efficient, and the query processing speed decreases.
Problem (2):
In query processing of a conventional database management system, when a task (operation) in a query is processed, a processing result of the task is created on disk every time as an intermediate result of the query. However since the processing result of each task has overlapped selections of records and attributes, many processing results having overlapped selections are created on disk. As a result, the number of disk access increases and the query processing speed decrease.
Problem (3):
For processing join operations in a query, the query processing of a conventional database management system uses the join method such as hash join to process each join operation.
However every time the join operation is processed, records in a relation used for retrievals of the join operation are accessed from disk, and a hash table which is used as an index to each record in the relation is created in main memory. Since each join operation uses a different relation for its retrievals, a separate hash table is created for each join operation. Therefore if many hash tables are created in main memory, there is not enough main memory space to process many queries, so the query processing speed decreases.
(Problems of Cashing Operations and Operation Results)
In order to reduce the number of disk accesses and to improve the query processing speed, the method of temporary storing (caching) frequently requested queries and query processing results, and operations in a query (relational algebra such as select operation, project operation, and join operation) and operation results (intermediate result of query) on disk is used.
When a query is requested from a user, a database management system checks whether a processing result of the query is already stored on disk. If it is stored, the stored query processing result is returned to the user as it is, instead of processing the query.
Also, even if the query is different, an operation in the query is same as the one already stored on disk, then the stored result of the operation is used to process the query instead of processing the operation.
The purpose of this method is to improve the query processing speed by increasing the number of query processing results and operations results to be stored, so that same queries and same operations are not processed repeatedly. However, this method requires that the queries and the operations must be same as the ones stored on disk, and it cannot utilize the stored results if the queries and the operations are different from the ones stored on disk.
Also in this method, if new records are added into the database and the database is updated, some stored processing results become old, so the database management system must dispose the processing results that became old. Therefore if database updates are performed frequently, the stored processing results that become old must be disposed every time, so that this method becomes no longer effective.
A database management system such as Microsoft SQL server does not immediately dispose the query processing results that become old in order to maintain the query processing speed; instead it returns old processing results to users for a while.
(Problems of Caching to Main Memory)
Another method for reducing the number of disk accesses and improving the query processing speed is the method of caching frequently accessed blocks from disk in main memory. In this method, when a query is requested from a user, it checks to see if the blocks to be accessed from disk for processing the query are already cached in main memory. If they are cached, the cached blocks are used.
The purpose of this method is to reduce the number of disk accesses and to improve the query processing speed by increasing the number of blocks to be cached in main memory.
However, this method requires that the block on disk and the cached block in main memory need be synchronized, so when a content of the block is changed, both the block on disk and the cached block in main memory need to be updated.
Also, when a linear search is performed for a relation of database having gigabytes of data to process a query, almost all blocks, except the blocks that are cached in main memory, need to be accessed from disk, so this method is no longer effective.
Also an operating system itself performs such memory management; so if a database management system increases the amount of blocks to be cached in main memory, the blocks that are cached in main memory are transferred to virtual memory (disk) by the operation system, so the query processing speed decreases.
(Problems of Pipelining Queries)
The method of pipelining queries is used as a method of reducing the number of disk accesses and improving the query processing speed by not creating intermediate results of queries on disk.
This method pipelines the operations in a query by using multiple microprocessors to process these operations in parallel on a computer system with multiple microprocessors.
In this method, the processing result of each operation in the query is streamed to the next operation in the query, so that the next operation does not wait for the previous operation to complete, and performs its operation on the partially finished result.
However, the operations that can be pipelined are limited to operations such as a select operation and a project operation, and cannot pipeline a join operation, because the join operation requires its previous operations to complete before processing.
This method also requires that the computer system have multiple microprocessors, so the computer system with a single processor is not effective.
(Problem of Multiple-Query Optimizations)
Multiple-query optimization is another method of processing queries which is different from a conventional method of processing queries where each query is processed separately. This method combines multiple queries to create a single query graph (for obtaining multiple query processing results, a query graph is used instead of a query tree), and executes the query graph to process multiple queries simultaneously. In order to do this, if multiple queries have same operations, a query graph is created to replace these same operations by a single operation, and the processing result of the single operation is shared.
Also, even if multiple queries have different operations, if one operation is a subset of another operation, a query graph is created to perform one operation on the processing result of another operation. The purpose of this method is to reduce the number of disk accesses and improve the query processing speed by utilizing processing results of the other operations.
However, if one operation is processed on the processing result of another operation, the indices of the relation in the database are not utilized, and records are searched linearly from the processing result of another operation. Therefore it rather increases the number of disk accesses and decreases the query processing speed.
Also there is a problem that when a number of queries increase, many different query graphs are created, and it takes time to find the best query graph with the lowest processing cost from these query graphs that are created.
In addition, Multiple-query optimizations does not immediately process a query requested from a user. Instead, queries which were given for a certain period of time are processed simultaneously. Therefore it cannot respond to each query quickly.