The present invention relates to data storage systems. One aspect of the present invention is directed to a method and apparatus for identifying changes to a logical object by examining information relating to the physical level in a data storage system wherein the logical object is stored. Another aspect of the present invention is directed to a method and apparatus for identifying a logical unit of data that belongs to a database by determining a group of identifiers that includes an identifier of the logical unit of data that uniquely specifies a location of the logical unit of data within the database, based upon information, concerning a structure of the database, that does not directly specify the group of identifiers.
Computer systems typically include one or more storage devices. FIG. 1 is a block diagram of such a typical computer system 100. The system 100 includes a host computer 110, having a processor 120 and a memory 130, and a storage system 140. The storage system 140 can be any one of a number of different types of storage devices (e.g., a tape storage device, a floppy diskette storage device, a disk storage device, etc.), or can include a combination of different types of storage devices.
Application programs, such a word-processing applications, desktop publishing applications, database software, etc., execute on the processor 120 and operate on logical objects (e.g., files, etc.) that are formed from one or more logically related blocks of data. When an application performs an operation on a logical object, the blocks of data forming the logical object are read from the storage system 140 and temporarily stored in the memory 130 of the host computer for more efficient processing. When the application is finished performing operations on the logical object, the data forming the logical object is read from memory 130 and written to the storage system 140.
For many applications, it is desirable to be able to determine a subset of the data stored on the storage system 140 that has changed since a particular point in time. An example of such a situation is an incremental backup. It should be appreciated that for fault tolerance reasons, the data stored on the storage system 140 for a particular application may be periodically backed up. For many applications, the amount of data stored on the storage system 140 can be quite large, such that the process of performing a full backup of all of the data stored on the storage system 140 can take a significant amount of time. It should be appreciated that when performing a system backup, the application program may be unavailable for other other uses, or alternatively, the performance of the computer system 100, as perceived by that application, as well as other applications, may be significantly impaired or degraded, such that the computer system 100 is effectively unavailable for other uses. Therefore, it is desirable to minimize the amount of time taken to back up the data on the storage system 140. To address this concern, the concept of an incremental backup has been developed, wherein a backup is performed on a subset of the data on the storage system 140, the subset corresponding only to the portions of data that have changed (i.e., have been added, deleted or modified) subsequent to the last time a fall backup was performed.
Many computer systems provide the capability of performing an incremental backup on all of the data stored on the storage system 140. However, it should be appreciated that the storage system 140 can be quite large, and can store a significant amount of data, such that the performance of an incremental backup on the entire storage system 140 can be a very time consuming process. Thus, it is desirable to provide an incremental backup capability that works on only the subset of data stored on the storage system 140 that relates to a particular application, and is therefore logically related. Many computer systems provide the capability of performing an incremental backup for a set of data that is logically related. This is done by identifying the changes that have been made to the logical objects that form the logically related data set since a particular reference point in time (e.g., a time that a last full backup for the set of logically related data was performed). One example of such an incremental backup facility is provided in an ORACLE relational database, and enables the data included in the database to be incrementally backed up relative to a particular reference point in time.
An ORACLE database is typically organized as a collection of tables, with each table including one or more rows of data. Rows are instances of a subject. For example, a table named xe2x80x9cCITYxe2x80x9d may include several different rows of data pertaining to different cities, such as Boston, Los Angeles, New York and Paris. Each row may include a number of columns that store attributes of the subject, such as population, median income, etc.
FIG. 2 is a structural diagram that illustrates the manner in which row data for a table is typically stored in an ORACLE database file. Each file 200 is typically an ordinary operating system file and includes a file header 210 and file data 220. The file data 220 is organized in data blocks 230, with each block having a block header 240 and block data 250. The block data 250 contains the actual row data that is associated with one or more tables in the database. Block header 240 includes a row directory that identifies each row of data within the respective data block 230, and identifies where each row of data begins and ends. The block header 240 also includes one or more change bits that identify whether information within the respective data block 230 has changed since a reference point in time (e.g., that point in time when the change bits were last reset). Any time a change is made to row data within the data block 230 after the reference point in time, one or more of the change bits is set by the database software so that the occurrence of this change can be later identified.
As noted above, an ORACLE database is capable of identifying that a change has been made to data blocks 230 of the database since a particular reference point in time. As the change bits in each data block 230 are typically reset by the database software after a backup of the database, this reference point is typically the time at which the most recent full or incremental backup of the database was performed. Because the database is capable of identifying those data blocks 230 that have been changed (i.e., added, deleted, or modified) since the last full or incremental backup, an incremental backup of the database can be performed by backing up only those changed data blocks. Since the incremental backup only backs up those data blocks whose data has changed, rather than all data blocks known to the database, the incremental backup generally takes much less time than a full database backup, especially with large databases. This time savings can be significant, as modifications to the database are typically prohibited during any form of backup. In the event of a catastrophic failure to the database, the database can be restored based on the last full backup and the most recent incremental backup(s).
Although an ORACLE database is capable of identifying changes that have been made to the database since a particular reference point in time, the amount of time that it takes to determine which data blocks have changed is directly proportional to the size of the database. That is, to determine which data blocks have changed, the database must scan each block header in every file of the database. Accordingly, for large databases, the benefits of an incremental backup may be mitigated by the amount of time it takes for the database to determine which data blocks have changed. Furthermore, it should be appreciated that the database can only determine changes to those data blocks that the database itself controls.
It is an object of one aspect of the present invention to provide an improved method and apparatus for identifying changes over a particular period of time within a set of logically related data. It is an object of another aspect of the invention to identify a unit of data stored in a database at a level of granularity that is smaller than the smallest unit of data conventionally accessible from the database.
According to an aspect of the present invention, an application programming interface (API) is provided that allows changes to logical objects on a host computer to be identified based on physical changes in a storage device. The API can be called by any application program to identify which logical blocks of a logical object have been changed since a reference point in time.
According to one embodiment of the present invention, a method of determining changes to a logical object subsequent to a reference time is provided. The logical object belongs to an application layer of a host computer in a computer system that includes the host computer, a storage system, and at least one mapping layer that maps the logical object to a physical layer relating to physical storage locations on the storage system. The physical layer includes physical change information relating to changes made to the physical storage locations on the storage system subsequent to the reference time. The method includes steps of mapping the logical object from the application layer to the physical layer to identify which physical storage locations include data corresponding to the logical object, examining the physical change information to identify any of the physical storage locations identified in the step of mapping that include data that has changed subsequent to the reference time, and determining that changes have been made to the logical object when any physical storage locations are identified in the step of examining as including data that has changed subsequent to the reference time.
According to another embodiment of the present invention, a computer readable medium encoded with a computer program is provided for a host computer that is coupled to a storage system and includes at least one mapping layer that maps logical objects belonging to an application layer on the host computer to a physical layer relating to physical storage locations on the storage system. The physical layer includes physical change information relating to changes made to the physical storage locations on the storage system subsequent to a reference time. The computer program, when executed on the host computer, performs a method of determining changes to a logical object subsequent to the reference time that includes steps of mapping the logical object from the application layer to the physical layer to identify which physical storage locations include data corresponding to the logical object, examining the physical change information to identify any of the physical storage locations identified in the step of mapping that include data that has changed subsequent to the reference time, and determining that changes have been made to the logical object when any physical storage locations are identified in the step of examining as including data that has changed subsequent to the reference time.
According to another embodiment of the present invention, a host computer for use with a storage system having a plurality of physical storage locations is provided. The host computer includes at least one mapping layer that maps a logical object belonging to an application layer on the host computer to a physical layer relating to the plurality of physical storage locations on the storage system. The physical layer includes physical change information relating to changes made to the plurality of physical storage locations on the storage system subsequent to a reference time. The host computer also includes determining means for determining, from the at least one mapping layer, a mapping the logical object from the application layer to the physical layer to identify which of the plurality of physical storage locations include data corresponding to the logical object, and means for identifying whether changes have been made to the logical object subsequent to the reference time by examining the physical change information corresponding to the plurality of physical storage locations identified by the determining means.
According to another embodiment of the present invention, a storage system for use with a host computer is provided. The host computer includes at least one mapping layer that maps a logical object belonging to an application layer on the host computer to a physical layer that includes at least one storage volume. The storage system includes at least one storage device that stores data included in the at least one storage volume, and a memory to store change information relating to the at least one storage volume on the storage system. The change information stored in the memory identifies whether changes have been made to the at least one storage volume subsequent to a reference time. The storage system further includes means for receiving, from the host computer, a mapping of the logical object from the application layer to the at least one storage volume that includes data corresponding to the logical object, and means for determining whether changes have been made to the logical object subsequent to the reference time by examining the change information relating to the at least one storage volume that includes data corresponding to the logical object.
According to another aspect of the present invention, a method and apparatus is provided for obtaining an identifier that uniquely identifies a location of a logical unit of data that belongs to a database. Advantageously, the method and apparatus do not require first accessing the logical object from the database using a label, in application space, as the method and apparatus determine the identifier based upon information concerning the structure of the database.
According to one embodiment of this aspect of the present invention, a method of obtaining a first identifier that uniquely identifies a location of a logical unit of data that belongs to a database is provided. The method includes a step of determining a group of identifiers that includes the first identifier based upon information, concerning a structure of the database, that does not directly specify the group of identifiers.
According to another embodiment, a method of obtaining a first identifier of a logical unit of data that belongs to a database is provided. The first identifier uniquely identifies a location of the logical unit of data within the database, and the logical unit of data has an application space label which can be used by application programs to access the logical unit of data from the database. The method includes a step of requesting the database to provide the first identifier without first accessing the logical unit of data from the database using the application space label.
According to a further embodiment, a computer readable medium is provided that is encoded with a computer program for execution on a host computer that includes a database. The computer program, when executed on the host computer, performs a method of obtaining a first identifier of a logical unit of data that uniquely identifies a location of the logical unit of data within the database. The method includes a step of determining a group of identifiers that includes the first identifier based upon information concerning a structure of the database, wherein the information does not directly specify the group of identifiers.
According to a still further embodiment, a computer readable medium is provided that is encoded with a computer program for execution on a host computer that includes a database. The computer program, when executed on the host computer, performs a method of obtaining a first identifier of a logical unit of data that belongs to the database and uniquely identifies a location of the logical unit of data within the database. The logical unit of data has a label in application space by which the logical unit of data can be accessed from the database, and the method includes a step of requesting the database to provide the first identifier without first accessing the logical unit of data from the database using the label in application space.
According to another embodiment of the present invention, a computer is provided. The computer includes a processor, and a memory that is coupled to the processor having a database loaded thereon. The database has a logical unit of data that belongs to the database and a first identifier that uniquely identifies a location of the logical unit of data within the database. The computer includes means for determining a group of identifiers that includes the first identifier based upon information, concerning a structure of the database, that does not directly specify the group of identifiers.
According to a further embodiment of the present invention, a computer is provided that includes a processor and a memory that is coupled to the processor having a database loaded thereon. The database has a logical unit of data that belongs to the database and a first identifier that uniquely identifies a location of the logical unit of data within the database, and the logical unit of data has an application space label which can be used by applications executing on the processor to access the logical unit of data from the database. The computer includes means for requesting the database to provide the first identifier without first accessing the logical unit of data from the database using the application space label.