Computerized data processing systems rely on various types of storage spaces to process and store data. For example, "main storage" is program-addressable storage from which data can be loaded directly into the registers of the central processing unit (CPU) for processing. "Auxiliary storage" is addressable storage other than main storage that can be accessed by means of input/output (I/O) channels, and includes direct access storage devices such as magnetic storage disks. "Expanded storage" is a high-speed, high-volume electronic extension of main storage that is accessed synchronously in multiple-byte increments, e.g., 409 (4K) bytes, sometimes referred to as a "page".
Data on storage disks exists in sets of fixed length records accessible in pages. The size of a page varies depending on the system in use. On some systems a page is 4096 (4K) bytes. Also, on some computers, e.g., virtual machines, the operating system keeps track of disk space in blocks of 256 pages called segments. This is necessary because the hardware requires that a control block be maintained for each segment for virtual address translation. When data is to be transferred from auxiliary storage to the CPU, pages of data are transferred to a storage buffer in segments.
In order for the CPU to process the data, the data normally should be in main storage. Main storage however, is limited and is therefore not used to store large amounts of data permanently. On the other hand, vast amounts of data may be stored on data disks. However, accessing data from disks is slow compared to the rate at which it can be processed in main storage. To compensate for the difference in access rates, a data buffer is used. A data buffer is a portion of storage used to hold input and output data temporarily. The data buffer can reside in main storage or expanded storage.
On multi-user computing systems, concurrent users time-share the resources on the computer systems through "virtual machines". In a virtual machine, which is a functional simulation of the real machine, each user addresses the computer main storage as though it were real. Addressable main storage in a virtual machine is called "virtual storage". The size of virtual storage is limited by the addressing scheme of the computer system and by the amount of real disk storage available. Data in virtual storage is mapped to real addresses when the CPU references the data. Mapping is the establishment of correspondences between the physical storage and virtual storage locations. An address translation table is maintained by the operating system for this purpose.
On virtual machines, each user's reference to a memory address is referred to as a virtual address, and each range of addressable space available to a user is called an address space. When a user references a virtual storage location, the page containing that location may be on disk or expanded storage as indicated by a "flag" in the address translation table. When a page is to be copied to main storage, the operating system reads the page into an available real storage page location. When completed, the page translation table is updated to reflect the new page location. If no real storage space is available, the operating system frees up main storage space by "paging out" least recently used pages.
A typical database storage system comprises a directory disk, one or more data disks, and one or two log disks similar to the data disks. The directory disk contains information on the mapping of the database pages from virtual storage to their real physical location and other information describing the physical configuration of the data base. The data disks store data, while the log disks record transactions against the database.
In a database system, users are assigned logical pages on data disk to store data objects. A data object is a logical set of pages in the database. For example, in a relational database system, a data object may be viewed as a set of pages containing records of rows and columns of a table where each row is a separate record, and each column is a different data field. When a data object is created, entries are inserted in the database directory disk to indicate which data object pages contain data, and their physical location on a data disk. Initially, only directory space is taken, but as a user inserts data into a data object, pages are allocated on a data disk and the directory disk is updated to identify those pages.
In order to maintain data integrity, the database system has a mechanism to take "checkpoints" of the database at certain intervals to ensure that a consistent version of the database is saved. For example, when a database page is modified, a copy of the page as of the previous checkpoint is kept unchanged, the modified version of the page is copied to disk, and the page directory is updated to point to the new location. Hence, at checkpoint time, the modified version of the database becomes the current copy of the database.
On virtual machines, data space pages can be mapped to disk by various techniques. For example, contiguous virtual storage pages can be mapped to contiguous disk pages. This is referred to as a physical mapping. Alternatively, contiguous virtual pages can be mapped to non-contiguous disk pages. This is referred to as a logical mapping.
On some types of virtual machines, users access multiple address spaces. Some of these address spaces, however, contain only data (not computer instructions) and are referred to as data spaces. Furthermore, data space pages can be mapped to a data disk in such a manner as to eliminate the need for the database program manager to execute page I/O operations in order to move data between a data disk and main storage. On these systems, the location of a data object page on a data disk is known to the operating system. When the page is referenced by a user, it is read from its data disk location by the operating system without requiring a specific disk operation from the database program manager. When a page is directly accessible by the operating system, the database system operates more efficiently with less demands on CPU processing cycles.
On database systems, there is a continuing need to improve the overall efficiency of the system to handle large amounts of data. In particular, there is a need for faster responses to queries and data changes, efficient use of real storage, and improved efficiency in handling data objects. This, in turn, directs a need for database systems that optimize the use of data spaces, map only those pages of a data object that are referenced rather than the whole data object, and minimize database downtime at checkpoints.
In the prior art, various schemes are available to use data spaces. However, a method or means has not been found that discloses a multi-user system using data spaces in virtual memory for handling data objects of various sizes. Examples of prior art involving virtual memory but not addressing this deficiency, include: U.S. Pat. No. 4,742,447 ("Method To Control I/O Accesses In Multi-tasking Virtual Memory Virtual Machine Type Data Processing System") discloses a method for accessing information in a page segmented virtual memory data processing system in which virtual machines running UNIX type operating systems are concurrently established, and in which a memory manager controls the transfer of information between primary storage and secondary storage devices in response to the occurrence of page faults.
U.S. Pat. No. 4,843,541 ("Logical Resource Partitioning of a Data Processing System") discloses a method and means for partitioning the resources in a data processing system into a plurality of logical partitions. The main storage, expanded storage, the channel and sub-channel resources of the system are assigned to the different logical partitions in the system to enable a plurality of preferred guest programming systems to run simultaneously in the different partitions.
U.S. Pat. No. 4,843,542 ("Virtual Memory Cache for Use in Multi Processing Systems") discloses a system for maintaining data consistency among distributed processors, each having an associated cache memory.
U.S. Pat. No. 4,922,415 ("Data Processing System for Converting Virtual to Real Addresses Without Requiring Instructions from the Central Processing") discloses a method in which a controller performs the translation functions for the inter-operability of the processor and memory and does so without requiring instructions from the processor.
U.S. Pat. No. 4,961,134 ("Method For Minimizing Locking and Reading in a Segmented Storage Space") discloses a page accessing method in a segmented table-space which eliminates unnecessary reading and locking. The table-space comprises data pages grouped into identical-sizes segments, each segment storing data for a single table. A status indicator for each data page of a segment is kept in a separate segment control block stored in a space map page.
Consequently, there is an unfulfilled need for a means to create and use data spaces to accommodate large data objects of different sizes, including data objects which may be much larger than the size of a single real data space.
Also there is an unfulfilled need for a means to make more efficient use of real storage by mapping to a data disk, only those data object pages from a data space that are referenced by the CPU, rather than mapping the entire data object.
Yet another unfulfilled need is a means to efficiently save modified pages from a data space to a data disk so as to reduce the database down-time at checkpoints.
A full understanding of how the present invention addresses the above unfulfilled needs may be had by referring to the following description and claims taken in conjunction with the accompanying drawings.