One of the key advantages of storing large amounts of data in a database is that a specific subset of the stored data can be retrieved in an organized manner. To learn about customers, businesses are collecting various types of information about their customers, such as personal data, geographic/demographic data, purchasing habits, and so forth. Such customer data are stored in a database system, such as in a relational database management system (RDBMS), where the data can be processed and sorted into a format suitable for reporting or analysis. An example of a database system in which such information is collected is a data warehouse in which data is input from a variety of sources and organized into a format that is structured for query and analysis or reporting. The volume of data collected in a large data warehouse is typically in the gigabyte and sometimes in the terabyte or higher range.
To handle the massive amount of data that is collected and processed in such data warehouses, sophisticated platforms are typically employed. The platforms include parallel processing systems, such as massive parallel processing (MPP) systems or symmetric multiprocessing (SMP) systems. An MPP system typically is a multi-node system having a plurality of physical nodes interconnected by a network. An SMP system typically is a single-node system having multiple processors. Collected data is stored in storage devices in such systems, which are accessible by the various nodes or processors. In a parallel system, stored data portions are accessible in parallel to increase access speeds.
When a particular set of data is sought from a database, several database resources are activated in order to locate the desired data. The database resources that are activated comprise data servers, query execution units, and the like. Often, in order to locate a specific set of data, many storage areas of a database are searched. Searching many locations in the database often involves utilizing many data servers that support the data system, thereby employing a considerable amount of database resources.
Simultaneous queries may be invoked by one or more users who have access to a particular database. Therefore, any one query utilizing a large amount of resources to locate a particular set of data can interfere with the execution of other query requests, which may lead to inefficient usage of the database system.