1. Field of the Invention
This invention relates in general to computer-implemented database systems, and, in particular, to identifying read claims in a database.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into tables which consist of rows and columns of data. The rows are formally called tuples or records. A database will typically have many tables and each table will typically have multiple tuples and multiple columns. Tables are assigned to table spaces. A table space is associated with direct access storage devices (DASD), and, thus, tables, are stored on DASD, such as magnetic or optical disk drives for semi-permanent storage.
A table space can be a system managed space (e.g., an operating system file system) or a database managed space. Each table space is physically divided into equal units called pages. Each page, which may contain, for example, 4K bytes, holds one or more rows of a table and is the unit of input/output (I/O). The rows of a table are physically stored as records on a page. A record is always fully contained within a page and is limited by page size. As users move towards working with image data and other large data objects, storing data in conventional records becomes difficult.
An index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key and a record identifier (RID). A key is one of the fields of the record or one of the attributes of a row. The key ensures that a row is unique. The RID provides the physical location of a row (i.e., the page number and location within the page). Building an index for a large file can take a considerable amount of elapsed time. The process involves scanning all records in the file, extracting a key value and RID value from each of the records, sorting all of the key/RID values, and then building the index from the sorted key/RID values.
Traditionally, an RDBMS stored simple data, such as numeric and text data. In a traditional RDBMS, the underlying storage management has been optimized for simple data. More specifically, the size of a record is limited by the size of a data page, which is a fixed number (e.g., 4K) defined by a computer developer. This restriction in turn poses a limitation on the length of columns of a table. To alleviate such a restriction, most computer developers today support a new built-in data type for storing large objects (LOBs). LOBs, such as image data, typically take up a great deal of storage space.
In a shared data environment, some DBMSs use a LOB manager sub-component that manages space in LOB table spaces (e.g., deallocates and allocates LOB table space). These table spaces use a shadow copy recovery scheme. Using such a scheme, when a LOB value is deleted, the pages storing the LOB value are deallocated, but must be protected from reallocation until the deleting transaction commits and no other transaction is reading the value anymore. For example, when a LOB is deleted, the pages storing the LOB value are deallocated. Before reallocating the pages, the LOB manager sub-component ensures that any deallocated space is committed, and that no currently active transaction has a read interest in the deleted LOB. In conventional systems, both of these checks are performed using locking.
Locking is used in a shared data environment to prevent concurrent transactions from inconsistently modifying the same data. That is, one transaction can lock a portion of the table space to prevent other transactions from modifying the data while that one transaction is accessing the data in the locked portion of the table space. Thus, locking can guarantee for one transaction that shared data accessed by that transaction does not contain uncommitted updates of other transactions. In particular, locks ensure that a user does not access data that has been changed by another user and not yet committed or data that has been earmarked for change. However, locking is time consuming and slows the speed of page allocations.
Therefore, there is a need in the art for an improved technique of determining the age of reading transactions in a database.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for identifying read claims in a database.
In accordance with the present invention, the database is stored on at lease one data storage device connected to a computer. A read identifier is stored for each reading transaction. The read identifier reflects a time at which the reading transaction first accesses an object stored in the database. The read identifiers are used to determine an age of an oldest active transaction.