1. Field of the Invention
The invention relates generally to techniques for making frequently-referenced data more available to programs using the data and more specifically to techniques for making distributed queryable caches of data that is contained in large databases. The caches are transparent to the program that is querying the large data base in that the program generally need not be altered when a cache is added.
2. Description of Related Art
Local and Remote Copies of Data
A common problem in data processing systems is that frequently-used data cannot be quickly accessed, either because there is not enough capacity in the connection between the processor and the data storage or because too many program executions are contending for access to the data storage. One way of solving the problem is to make a copy of the data and put the copy in a location where it is more accessible to some of the program executions that are using the data. Those program executions can then reference the copy of the data in the more accessible location rather than the original data. In the following, the more accessible copy will be called the local copy, while the less-accessible original will be called the remote copy. It is to be understood, of course, that the terms remote and local represent degrees of accessibility and that both copies may be physically in the same location.
The local copy may be a complete copy of the data or it may be only a partial copy; further, the local copy may or may not be transparent to the executing program. When the local copy is transparent, the executing program can refer to a data item in exactly the same fashion as it did before the local copy existed. The reference goes first to the local copy. If the data is there (this is termed a hit), the data is retrieved from the local copy. If the data is not there (this is termed a miss), the reference goes automatically to the remote copy and the data is retrieved from the remote copy. When the local copy is not transparent, the program must be changed to take into account the fact that references are made to the local copy, instead of to the remote copy. In general, local copies of remote data that are both partial and transparent to the executing program are termed caches. Transparence is a relative term. Though the executing program need in general not be modified when the cache is added, the presence of the cache may result in some differences in the program""s behavior, and in some cases the differences in behavior may require modification of the program.
Whenever there are local and remote copies of data, the copies must be kept consistent with each other. This is easy only if the data never changes; if it does change, then changes in one of the copies must be propagated to others of the copies. In some cases, the local copy is read only; the programs using it cannot change it, but changes in the remote copy are propagated to the local copy. In other cases, the local copy is updateable, and when this is the case, the updates must be propagated to the remote copy. When the propagation is done depends on how consistent the copies must be. For example, if the copies can be inconsistent for a whole working day, changes need only be propagated once a day. On the other hand, if it is required that the copies are always consistent, any change must be propagated immediately.
Databases are organized collections of data in which sets of data items are accessed by queries. The query describes the set of data items to be accessed in terms of the database""s organization. The most common kind of organization for databases is relational: the data is organized into tables with named columns and rows. Each row has afield for each of its columns. A field for a column in a row contains that row""s value for the column. Values in the fields can serve to uniquely identify rows of the database and to relate rows in one table to rows in a different table. A set of data items contained in a relational database is specified in terms of rows and columns of the table the items are in and values in the row or rows that contain the items. The following Employees table offers a simple example:
This table has three columns and a row for each employee. the emp_id field of an employee""s row contains a unique identifier for the employee; the emp_name field contains the employee""s name; the emp_pay field contains the employee""s current pay.
Relational database systems use a language called SQL to specify data sets. An SQL request on a database describes an operation to be performed on a set of data in the database. The most common kind of request is a query, which describes an operation that reads the set of data. Other requests may modify the set of data or delete the set of data. For example, an SQL query to find out how much employee xe2x80x9cOwnerxe2x80x9d makes might look like this:
SELECT emp_pay FROM Employees
WHERE emp_name=xe2x80x9cOwnerxe2x80x9d
This instructs the data base system to return the value of emp_pay from the row of the table Employees in which the field Emp_name has the value xe2x80x9cOwnerxe2x80x9d. The result of the query will be xe2x80x9c$1,000,000xe2x80x9d. A database system contains not only the data, but also a description of the manner in which the data is organized. In the following, the description will be termed a data dictionary. Elsewhere, it may be termed the data base system""s metadata or its schema.
Many organizations keep huge amounts of information in large central databases. As networks have grown, so has the potential for these large databases to be bottlenecks. The bigger the network, the more users, and the greater the problems caused by contention for the data base. Moreover, the network""s carrying capacity is always less than the demands being made on it. As a result, it often takes a long time to transmit the query to the database, have the database perform the query, and have the results returned to the user. The development of the World Wide Web has of course increased all of these problems enormously. The database that contains the catalog of a Web merchant is accessible to literally everyone who has a connection to the Internet, and delays in getting a response are not only unpleasant for the user, but bad for the Web merchant""s business.
A common approach to solving bottlenecks caused by databases is by replicating the data base. As the name implies, a replicated data base has a copy at another location. When a set of replicated databases is set up, methods are specified for keeping the replications consistent. A replicated database may be a complete copy of the database it is a replication of, or a partial copy. In the latter case, the partial copy is termed a materialized view. A materialized view may be read only, in which case, it is kept consistent with the database it is a copy of, but cannot be updated by the user, or it may be updateable, in which case the data items in the materialized view may be updated and the updates are propagated to the database of which the materialized view is a copy. A replicated database system provided by Oracle Corporation, of Redwood Shores, Calif., USA, is described in detail in the document Oracle 8i Replication, available in May, 2001 at http://technet.oracle.com/doc/server.815/
A problem with materialized views is that they are not transparent to the application program. If the application program provides a query to the materialized view that specifies a set of data items that is not in the materialized view but is in the database the materialized view was replicated from, the query simply fails, and the application must attempt it again on the database the materialized view was replicated from. Materialized views are consequently typically used in situations where it is possible to completely predict what queries will be made from a given remote location. The database tables needed to answer those queries are included in the materialized view, and since a miss never occurs, the fact that the materialized view provides no support to an application that has experienced a miss is not a problem.
However, as networks expand and more and more people need access to databases to do their jobs or to purchase products, selecting the right materialized view becomes more and more difficult. Oracle Corporation has responded to this problem by developing the queryable database cache described in published PCT application WO 00/63800, Oracle Corporation, Web Servers with queryable dynamic caches, published 26 October 2000 and claiming priority from pending U.S. Ser. No. 09/294,656, Cusson, et al., Web servers with queryable dynamic caches, filed Apr. 19, 1999. The cache of WO 00/63800 is transparent. When an application program makes a query on the cache, the query goes to a data access layer, which provides the query to the cache. If the query misses, the data access layer redirects the query to a remote database. There is thus no need for the application to know anything about either the cache or the remote database.
Though the queryable cache described in WO 00/63800 does have the fundamental advantage of transparency to the application program, experience with the cache has revealed some areas where improvement is needed:
The queryable cache needs to be transparent not only to the application program, but also to the data access layer.
The queryable cache needs to handle SQL requests that reference stored procedures.
There needs to be an easy way of setting up the cache so that it has all of the copies of objects that it needs to function and also has the objects necessary to link the cache to the remote database system that it is a partial copy of.
There needs to be a way of making a user session to perform redirected operations in the remote database without requiring a password from the user to make the session.
The two-stage commit process used for transactions between distributed databases needs to be optimized for the situation where the transaction does not result in modifications to particular ones of the distributed databases.
It is an object of the invention disclosed herein to solve one or more of the foregoing problems. As will be described in more detail in the following, the problems have been solved in an implementation of the queryable cache which is based on Oracle""s techniques for making replicated databases that contain materialized views.
Techniques that are used in distributed database systems to select copies of objects in one database system that are to be copied to another database system and to ensure that the database systems involved have the necessary infrastructure to operate as a distributed database system.
In one aspect, the techniques simplify the selection of objects to be copied. Selection is done at the highest level by having a user provide simple descriptions of objects. A template creator creates a template from the descriptions and an object selector receives the template and makes a list of objects from the template. The objects described the template belong to a schema associated with a user of the database system and the list includes the objects described in the template and other objects that belong to the schema and are dependent from a table object in the list that belongs to the schema. The user thus need know nothing about low-level object dependencies when specifying the objects to be copied. Further simplification is achieved by the fact that the object descriptions from which the objects are made include only the object""s name, type, and schema. In particular, no DDL description of the object being copied is required.
In another aspect, the techniques simplify setting up the infrastructure required for a distributed database system. The object selector not only selects the objects to be copied, but also adds objects required for infrastructure such as links and authentication proxies to the database systems involved. With infrastructure for the database system that is receiving the copies, the object selector makes an object that has a DDL description of the infrastructure and includes it in the list of objects to be copied; with infrastructure for the data base system that is the source of the copies, the object selector adds the necessary objects directly to the source database system.
Other objects and advantages will be apparent to those skilled in the arts to which the invention pertains upon perusal of the following Detailed Description and drawing, wherein: