1. Field of the Invention
The present invention relates generally to replication in computer systems having two or more databases, and more particularly to design synchronization in such a system.
2. Description of the Related Art
A computer system may have two or more databases connected through a network, where at least one of the databases has data tables that are xe2x80x9creplicatedxe2x80x9d from data tables in one or more of the other databases. In general, replication involves the process of taking table changes stored in one database at a xe2x80x9csourcexe2x80x9d server and applying those changes to another database at a xe2x80x9ctargetxe2x80x9d server. Even when the network is not available, it is desirable that the databases be made continuously available for reading, updating, and maintenance. Such databases may be included in a portable computer, for example, which is only occasionally connected to the network.
The databases may include relational databases storing two-dimensional relations or tables. Here, a database management system (DBMS) stores data and retrieves or updates tables in response to Structured Query Language (SQL) statements. SQL has become the standard language used to interface to such DBMSs in computer systems.
In this replicated environment, there results in a need for appropriate (1) data synchronization, (2) design synchronization, (3) authorization synchronization, and (4) logic synchronization. Data synchronization involves propagating updates to columns in rows of tables in a source database to corresponding replicated columns in rows of tables in a target database. In authorization synchronization, one case exists where a user has access to all databases. Here, as users are granted access to database objects, or existing access privileges are revoked or changed, similar authorization changes on behalf of the user are made on the target database. Another case exists in authorization synchronization where a user has access to one or just a few target databases. Here, access privileges are allowed for the administration of the target database from a remote (and perhaps occasionally-connected) control point. With respect to logic synchronization, appropriate alterations are made to logic elements in a target database as database triggers, stored procedures, or declarative constraints are added, removed, or modified in a source database.
Design synchronization involves propagating updates to the xe2x80x9cschemaxe2x80x9d (design or structure), e.g., updates to the table design in one database as tables and/or table columns are added to or removed from another database. Design synchronization is an important concern for applications where the design of the database might change. Design changes do typically occur in computer systems, albeit perhaps relatively slowly over a period of time.
In some cases, data synchronization might fail if not preceded by design synchronization. For example, if a given column in a table in one database is made larger, then the corresponding column in the other databases might also need to be made larger to avoid resulting failures when trying to save a large value in too small of a space. As the number of databases in the replication network increases, so does the risk associated with not automating the design synchronization.
Conventional products, such as SQL Anywhere(trademark) available from Sybase, have addressed design synchronization using a xe2x80x9csave-and-replayxe2x80x9d approach. Such conventional products capture data design commands made to one database designated as a consolidation database, and later replay those data design commands on other databases. One advantage to this approach appears to be that few, if any, decisions need to be made as to whether a design change should be replayed at other databases. Remembering what design changes were made, and in what order, appears to be sufficient.
However, implicit in the save-and-replay approach is an assumption that it is actually appropriate to replay the design changes at the other databases. That is, the approach is based on the assumption that all of the same database objects actually exist (or do not exist) everywhere. Put another way, allowing propagation of a design change to table X at the server database is based on the assumption that the table X exists at the target database as well and that the object is in a known state, which may not always be the case.
The save-and-replay approach is also based on an assumption of DBMS homogeneity in the network. While there exists national and international standards for portions of SQL language pertaining to data manipulation (i.e., SELECT, INSERT, UPDATE, and DELETE), only a limited standard exists for data definition statements or commands. Different products utilize proprietary extensions (e.g., CREATE INDEX) which are not covered by the standard. Thus, the assumption that it is appropriate to replay data definition commands in a verbatim manner is based on an assumption that the commands are valid everywhere, which itself is based on the assumption that a homogeneous database network exists. Thus, if a DBMS is utilized in a network, a customer must enforce a de facto standard over the data definition extensions in SQL, a standard that does not exist outside of the customer""s controlled environment.
Furthermore, for the save-and-replay approach to work appropriately, controls need to be in place to ensure that database objects are not created or deleted in more than one instance. This is necessary because some SQL commands will fail if repeated after initial execution. For example, once a CREATE TABLE statement has been executed, repeating the same CREATE TABLE statement will fail since the DBMS will not allow an object to be created with an existing name. To avoid this particular problem, conventional products allow SQL data definition commands to be accepted only against the consolidation database; from there, they are replayed against the other databases. Because the replication is integrated with the DBMS, conventional products are able to selectively disable the SQL data definition. There is no such ability in a general, heterogeneous DBMS case.
Accordingly, there is an existing need for an improved method and apparatus for performing design synchronization, especially in pervasive computing scenarios.
A method of performing design synchronization in a computer system is described herein. At least a portion of a source schema and at least a portion of a target schema are compared. The target schema is modified based on the comparison. In a preferred fashion, design changes are made at the target using the minimum number of equivalent operations and target-specific syntax.