1. Field of the Invention
The invention relates generally to techniques for making frequently-referenced data more available to programs using the data and more specifically to techniques for making distributed queryable caches of data that is contained in large databases. The caches are transparent to the program that is querying the large data base in that the program generally need not be altered when a cache is added.
2. Description of Related Art
Local and Remote Copies of Data
A common problem in data processing systems is that frequently-used data cannot be quickly accessed, either because there is not enough capacity in the connection between the processor and the data storage or because too many program executions are contending for access to the data storage. One way of solving the problem is to make a copy of the data and put the copy in a location where it is more accessible to some of the program executions that are using the data. Those program executions can then reference the copy of the data in the more accessible location rather than the original data. In the following, the more accessible copy will be called the local copy, while the less-accessible original will be called the remote copy. It is to be understood, of course, that the terms remote and local represent degrees of accessibility and that both copies may be physically in the same location.
The local copy may be a complete copy of the data or it may be only a partial copy; further, the local copy may or may not be transparent to the executing program. When the local copy is transparent, the executing program can refer to a data item in exactly the same fashion as it did before the local copy existed. The reference goes first to the local copy. If the data is there (this is termed a hit), the data is retrieved from the local copy. If the data is not there (this is termed a miss), the reference goes automatically to the remote copy and the data is retrieved from the remote copy. When the local copy is not transparent, the program must be changed to take into account the fact that references are made to the local copy, instead of to the remote copy. In general, local copies of remote data that are both partial and transparent to the executing program are termed caches. Transparence is a relative term. Though the executing program need in general not be modified when the cache is added, the presence of the cache may result in some differences in the program's behavior, and in some cases the differences in behavior may require modification of the program.
Whenever there are local and remote copies of data, the copies must be kept consistent with each other. This is easy only if the data never changes; if it does change, then changes in one of the copies must be propagated to others of the copies. In some cases, the local copy is read only; the programs using it cannot change it, but changes in the remote copy are propagated to the local copy. In other cases, the local copy is updateable, and when this is the case, the updates must be propagated to the remote copy. When the propagation is done depends on how consistent the copies must be. For example, if the copies can be inconsistent for a whole working day, changes need only be propagated once a day. On the other hand, if it is required that the copies are always consistent, any change must be propagated immediately.
Local and Remote Copies of Databases
Databases are organized collections of data in which sets of data items are accessed by queries. The query describes the set of data items to be accessed in terms of the database's organization. The most common kind of organization for databases is relational: the data is organized into tables with named columns and rows. Each row has a field for each of its columns. A field for a column in a row contains that row's value for the column. Values in the fields can serve to uniquely identify rows of the database and to relate rows in one table to rows in a different table. A set of data items contained in a relational database is specified in terms of rows and columns of the table the items are in and values in the row or rows that contain the items. The following Employees table offers a simple example:
emp_idemp_nameemp_pay000001“Owner”$1,000,000. . .. . .. . .000999“Drudge”  $20,000This table has three columns and a row for each employee. the emp_id field of an employee's row contains a unique identifier for the employee; the emp_name field contains the employee's name; the emp_pay field contains the employee's current pay.
Relational database systems use a language called SQL to specify data sets. An SQL request on a database describes an operation to be performed on a set of data in the database. The most common kind of request is a query, which describes an operation that reads the set of data. Other requests may modify the set of data or delete the set of data. For example, an SQL query to find out how much employee “Owner” makes might look like this:                SELECT emp_pay FROM Employees        WHERE emp_name=“Owner”        
This instructs the data base system to return the value of emp_pay from the row of the table Employees in which the field Emp_name has the value “Owner”. The result of the query will be “$1,000,000”. A database system contains not only the data, but also a description of the manner in which the data is organized. In the following, the description will be termed a data dictionary. Elsewhere, it may be termed the data base system's metadata or its schema.
Many organizations keep huge amounts of information in large central databases. As networks have grown, so has the potential for these large databases to be bottlenecks. The bigger the network, the more users, and the greater the problems caused by contention for the data base. Moreover, the network's carrying capacity is always less than the demands being made on it. As a result, it often takes a long time to transmit the query to the database, have the database perform the query, and have the results returned to the user. The development of the World Wide Web has of course increased all of these problems enormously. The database that contains the catalog of a Web merchant is accessible to literally everyone who has a connection to the Internet, and delays in getting a response are not only unpleasant for the user, but bad for the Web merchant's business.
A common approach to solving bottlenecks caused by databases is by replicating the data base. As the name implies, a replicated data base has a copy at another location. When a set of replicated databases is set up, methods are specified for keeping the replications consistent. A replicated database may be a complete copy of the database it is a replication of, or a partial copy. In the latter case, the partial copy is termed a materialized view. A materialized view may be read only, in which case, it is kept consistent with the database it is a copy of, but cannot be updated by the user, or it may be updateable, in which case the data items in the materialized view may be updated and the updates are propagated to the database of which the materialized view is a copy. A replicated database system provided by Oracle Corporation, of Redwood Shores, Calif., USA, is described in detail in the document Oracle 8i Replication, available in May, 2001 at http://technet.oracle.com/doc/server.81.5/
A problem with materialized views is that they are not transparent to the application program. If the application program provides a query to the materialized view that specifies a set of data items that is not in the materialized view but is in the database the materialized view was replicated from, the query simply fails, and the application must attempt it again on the database the materialized view was replicated from. Materialized views are consequently typically used in situations where it is possible to completely predict what queries will be made from a given remote location. The database tables needed to answer those queries are included in the materialized view, and since a miss never occurs, the fact that the materialized view provides no support to an application that has experienced a miss is not a problem.
However, as networks expand and more and more people need access to databases to do their jobs or to purchase products, selecting the right materialized view becomes more and more difficult. Oracle Corporation has responded to this problem by developing the queryable database cache described in published PCT application WO 00/63800, Oracle Corporation, Web Servers with queryable dynamic caches, published 26 Oct. 2000 and claiming priority from U.S. Ser. No. 09/294,656, Cusson, et al., Web servers with queryable dynamic caches, filed Apr. 19, 1999. The cache of WO 00/63800 is transparent. When an application program makes a query on the cache, the query goes to a data access layer, which provides the query to the cache. If the query misses, the data access layer redirects the query to a remote database. There is thus no need for the application to know anything about either the cache or the remote database.
Though the queryable cache described in WO 00/63800 does have the fundamental advantage of transparency to the application program, experience with the cache has revealed some areas where improvement is needed:    The queryable cache needs to be transparent not only to the application program, but also to the data access layer.    The queryable cache needs to handle SQL requests that reference stored procedures.    There needs to be an easy way of setting up the cache so that it has all of the copies of objects that it needs to function and also has the objects necessary to link the cache to the remote database system that it is a partial copy of.    There needs to be a way of making a user session to perform redirected operations in the remote database without requiring a password from the user to make the session.    The two-stage commit process used for transactions between distributed databases needs to be optimized for the situation where the transaction does not result in modifications to particular ones of the distributed databases.
It is an object of the invention disclosed herein to solve one or more of the foregoing problems. As will be described in more detail in the following, the problems have been solved in an implementation of the queryable cache which is based on Oracle's techniques for making replicated databases that contain materialized views.