1. Field of the Invention
The present invention relates to the field of computer programming, and more particularly to a method, system, and computer program product for using working set hints and query signatures to optimize the selection and execution of fixed, static queries or services which are used to retrieve information from a database, object server, or similar data repository.
2. Description of the Related Art
Various caching and read-ahead strategies can dramatically improve the efficiency and flexibility of an executing application program. Caching is a technique known in the computer programming art for increasing the speed of data retrieval. It involves storing data in an easily-accessible location from which it can be quickly retrieved. Read-ahead is another technique known in the art, whereby a prediction is made as to which data will be needed by a software application: that data is then retrieved in advance. When the prediction has been accurately made, the data will be available at the time the application needs it and the application will not have to wait while a retrieval operation takes place. Typically, the data that is read ahead is the xe2x80x9cworking setxe2x80x9d, where a working set is the set of data that the application is using at a point in time (or is expected to need to use, in the case of read-ahead).
In object-oriented programming, the working set is the set of objects the application is using. An application typically consists of multiple tasks, and each task may have its own working set. For example, suppose an application uses Employee objects as well as Department objects and Project objects for those employees. An employee may change from one department to another, necessitating a change to his existing stored data. To perform this change-department task, a user of the application will typically cause the application to retrieve the employee object for this employee, and then retrieve the employee""s department object. The working set for this task therefore comprises objects from the Employee and Department classes. Suppose an employee may be assigned to work on zero or more projects at any given time, and a manager wishes to obtain a list of all the projects to which his employees are assigned. This project-inquiry task involves retrieving each employee object and zero or more project objects for each one, but would not likely require any department objects to be accessed. Thus, for this task, the working set comprises objects from the Employee and Project classes.
When objects are persisted using a relational database, the various classes of objects typically correspond to separate tables in the database. For the example application discussed above, the database would contain tables for Employee, Department, and Project data. Each employee then has a row in the Employee table, and is associated with a row in the Department table (assuming each employee is assigned to a single department) and zero or more rows in the Project table. The application retrieves data from these tables by issuing a database query. It may take a considerable amount of time, relative to the overall processing time of a task, to complete a database query operation. The query operation involves multiple components of the computer system. After the application issues the query, the operating system may be involved, after which the database system receives the query (and possibly reformats it), locates the requested rows from the table or tables, formats the rows into a message to be returned to the application, and contacts the operating system with this result message. The message is then received by the requesting application, which can then begin to process the data. When the database is remotely located, such as in a network computing environment, the time required to complete the query is increased by the time required for the communication over the network to occur between the client machine and the database server (including the possibility of communications over intermediate connections between the client and database server). Thus, it can be seen that issuing a database query is an expensive operation in terms of elapsed time.
When a client machine and database server are connected in a local-area network (LAN) environment, it has been demonstrated that the amount of data sent from the server to the client in response to a database query has relatively limited influence on the overall processing cost of data retrieval. Instead, the access operation itself accounts for the majority of the processing time and thus forms the processing bottleneck. When the client and server are connected in a wide-area network (WAN), the amount of data transmitted does influence the data retrieval cost, but the access operation continues to account for a significant portion of the cost. In both environments, the overall efficiency of the system can be increased by retrieving as much of the working set as possible during each retrieval operation, with a larger efficiency gain being realized in the LAN environment. This is where the read-ahead operation comes into play: if a database retrieval is required for one object that an application requires access to, it is more efficient to retrieve additional objects at the same timexe2x80x94assuming, of course, that the objects retrieved in the read-ahead are those that will actually be used by the application in its subsequent operations.
Read-ahead and caching each contribute to efficiency and flexibility gains for an executing application, and when used together the gains are even more dramatic. The read-ahead operation retrieves data in advance of when the application is ready to access it, and caching stores the retrieved data in a location from which it can be quickly accessed when it is needed. In an application that does not use read-ahead and caching, the application is always starved for data, reading one object at a time from the data source as further data is needed. When the underlying object model of the application has many associations from one class to another (and therefore many relationships between tables in the database), traversing this model""s associations as the application user navigates the model to perform various tasks will typically require access to many objects. When each object is retrieved from the database one at a time, a large number of expensive database round trips will likely be required. This may lead to processing delays that are unacceptable to the application user.
A read-ahead scheme allows the application to minimize the number of database round trips, and therefore reduce the processing delays in the application, by retrieving large object graphs (i.e. multiple objects, having interrelationships that form a graph structure) within one query. In this approach, read-ahead preferably involves instantiating the requested objects and caching the data for their related objects, thereby making sure that the data is present for the objects that are most likely needed next by the application (but without the time and storage overhead that would be required if all retrieved objects were immediately instantiated).
For most object applications, the relationships (referred to equivalently herein as xe2x80x9cassociationsxe2x80x9d) between object classes provide a semantically meaningful way for controlling retrieval of objects from the database. As the application traverses a particular relationship, the related objects can be retrieved accordingly (from the cache, if they have been retrieved and cached in a read-ahead operation, or from the data repository if they are not locally available). Using the project-inquiry task described earlier, the employees for the manager""s department will typically have already been retrieved and instantiated before beginning the traversal of the employee to project relationship. The traversal then necessitates retrieving the appropriate project objects. How much of an object graph to read ahead depends on the application context. For example, a graphical user interface (xe2x80x9cGUIxe2x80x9d) component of an application may need only a few levels of an object graph. A report writing batch subsystem, on the other hand, may need to load the entire graph.
Therefore, within the same application it is desirable to be able to dynamically define the depth of the object graph that will be loaded upon issuing a database retrieval query. In the GUI example above, the depth to be loaded will preferably be relatively shallow, whereas in the report writing example, the depth will preferably be relatively deep. In existing object-oriented systems, a relationship from one class to another is typically encapsulated within one of the classes and thus can only be accessed by invoking a method on an object in that class. This results in an object model with relationships that are tightly bound to particular hard-coded database queriesxe2x80x94i.e. to those queries which the programmer has provided as encapsulated methods. This existing approach prevents loading object graphs of dynamically varying depths. It would be preferable if each relationship had access to a set of queries, each providing a different object graph load depth, where the query to perform in a particular situation could then somehow be determined based on the application context. This would be especially beneficial where the queries to be performed are static, pre-compiled queries (such as Structured Query Language, or SQL, queries) and in heterogeneous environments where one server may use services of another. For example, a client workstation may be connected to an object server which selects and retrieves instantiated objects (which the object server separately requests and receives from a database server). Furthermore, there may be several intervening systems in between the server connected to the client and the database server. In these heterogeneous environments, the application executing on the client workstation does not issue database queries: rather, database queries are typically only issued by the server connected to the database server. In this type of distributed system, there may be many xe2x80x9cgenericxe2x80x9d queries (implemented as service invocations, for example) available for use by many applications, where the queries are very similar in function but also vary somewhat. Typically, the client will invoke a query that is stored in the client to access a database, or will invoke one of these services that is stored in a remote server in order to request the remote system to issue the corresponding database query. (Hereinafter, references to queries are to be interpreted as referring equivalently to these types of service invocations, unless otherwise stated.) In this case, a technique is needed for determining which query to select in a particular situation. When the queries are selected for execution at a system remote from the system on which the application is executing, it becomes difficult for the application to influence the selection using the current application context. The selectable queries remain as fixed, hard-coded logic which cannot be dynamically optimized at run-time to account for the needs of a particular application.
Accordingly, what is needed is a technique whereby the information to be retrieved in a query operation, and in particular a query operation intended as a read-ahead retrieval, can be efficiently selected and executed.
An object of the present invention is to provide a technique for efficiently selecting an optimal query to use for retrieving information with a data repository query operation.
Another object of the present invention is to provide a technique for efficiently selecting an optimal service to invoke for retrieving information from a data repository.
A further object of the present invention is to provide this technique such that data can be efficiently retrieved in a read-ahead operation.
Another object of the present invention is to provide this technique where the source of the retrieved data is a relational or a non-relational database, and where the destination of the data is an application written using an object-oriented programming language.
Still another object of the present invention is to provide this technique in a manner that does not require modification of an application.
Yet another object of the present invention is to provide this technique through use of task working set hints and query signatures.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a method, system, and computer-readable code for optimizing query selection and execution in a computing environment. This technique comprises: providing a plurality of working set hints corresponding to a plurality of finders associated with a plurality of executable tasks, wherein each of the tasks may have one or more of the finders and each of the finders may have zero or more of the hints; providing a query signature corresponding to each of a plurality of executable queries; locating a default query if a task to be executed or a finder to be used by the task has no corresponding hints, and for locating the corresponding hints for the finder of the task otherwise; using the located hints to select one of the queries to execute; and executing the default query or the selected query.
Preferably, each of the hints describes data to be read ahead for use by the corresponding task, and each of the query signatures describes data to be retrieved by the corresponding query.
Using the located hints may further comprise comparing the located hints to the plurality of query signatures and selecting a particular one of the queries for which the corresponding query signature matches the located hints. Selecting a particular one may further comprise selecting a best-matching one of the queries based on a result of the comparison.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.