1. Field of the Invention
The present invention relates to information processing environments and, more particularly, snapshot isolation support for distributed query processing in a shared disk database cluster.
2. Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy data access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level.
In recent years, users have demanded that database systems be continuously available, with no downtime, as they are frequently running applications that are critical to business operations. In response, distributed database systems have been introduced. Architectures for building multi-processor, high performance transactional database systems include a Shared Disk Cluster (SDC), in which multiple computer systems, each with a private memory share a common collection of disks. Each computer system in a SDC is also referred to as a node, and all nodes in the cluster communicate with each other, typically through private interconnects.
In general, SDC database systems provide for transparent, continuous availability of the applications running on the cluster with instantaneous failover amongst servers. More and more, mission-critical systems, which store information on database systems, such as data warehousing systems, are run from such clusters. Data warehouse systems represent a type of database system optimized as a decision support system by tracking and processing large amounts of aggregate database information, i.e., the data warehouse. Data warehouses contain a wide variety of data that could be used to present a coherent picture of business conditions at a single point in time. Products exist for building, managing, and using a data warehouse, such as Sybase IQ available from Sybase, Inc. of Dublin, Calif.
Although SDC database systems provide increased availability and reliability in such environments, they also introduce a number of new challenges. Among these challenges is achieving snapshot isolation.
In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice, the last committed values that existed at the time the transaction started are read), and the transaction itself will successfully commit only if no updates made by it conflict with any concurrent updates made since that snapshot. Snapshot isolation does not provide strict serialization but allows queries and updates to run with greater concurrency. In snapshot isolation, queries do not block for updates and vice-versa. Database systems with heavy query workload (e.g., data warehouses) greatly from this property.
Database systems have been known to use multi-version concurrency control (MVCC) to achieve snapshot isolation. However, extending MVCC to support snapshot isolation in shared disk database clusters poses unique challenges. Normally, if users want snapshot isolation in an SDC, only non-distributed operations (including queries) are allowed. A possible approach to support snapshot isolation in an SDC would be to localize a transaction to the originating node in the cluster. This would require trivial changes to succeed in the cluster environment, as it would be akin to operations of a single node database. However, this fails to support transactions involving queries or updates which need to be executed in part on more than one node in the SDC.
Accordingly, a need exists for an approach to support snapshot isolation in SDC When queries are executed in a distributed manner. The present invention addresses these and other needs.