1. Field of the Invention
The present invention relates to the field of computer programming, and more particularly to a method, system, and computer readable code for creating and using a structured cache to increase the efficiency of reading persistent objects from a database.
2. Description of the Related Art
Caching is a technique known in the computer programming art for increasing the speed of data retrieval. It involves storing data in an easily-accessible location from which it can be quickly retrieved. Read-ahead is another technique known in the art, whereby a prediction is made as to which data will be needed by a software application: that data is then retrieved in advance. When the prediction has been accurately made, the data will be available at the time the application needs it and the application will not have to wait while a retrieval operation takes place. Typically, the data that is read ahead is the xe2x80x9cworking setxe2x80x9d, where a working set is the set of data that the application is using at a point in time.
In object-oriented programming, the working set is the set of objects the application is using. An application may consist of multiple tasks, and each task may have its own working set. For example, suppose an application uses Employee objects as well as Department objects and Project objects for those employees. An employee may change from one department to another, necessitating a change to his existing stored data. To perform this change-department task, a user of the application will typically retrieve the employee object for this employee, and then retrieve the employee""s department object. The working set for this task therefore comprises objects from the Employee and Department classes. Suppose an employee may be assigned to work on zero or more projects at any given time, and a manager wishes to obtain a list of all the projects to which his employees are assigned. This project-inquiry task involves retrieving each employee object and zero or more project objects for each one, but would not is likely require any department objects to be accessed. Thus, for this task, the working set comprises objects from the Employee and Project classes.
When objects are persisted using a relational database, the various classes of objects typically correspond to separate tables in the database. For the example application discussed above, the database would contain tables for Employee, Department, and Project data. Each employee then has a row in the Employee table, a row in the Department table (assuming each employee is assigned to a single department), and zero or more rows in the Project table. The application retrieves data from these tables by issuing a database query. It may take a considerable amount of time, relative to the overall processing time of a task, to complete a database query operation. The query operation involves multiple components of the computer system. After the application issues the query, the operating system may be involved, after which the database system receives the query (and possibly reformats it), locates the requested rows from the table or tables, formats the rows into a message to be returned to the application, and contacts the operating system with this result message. The message is then received by the requesting application, which can then begin to process the data. When the database is remotely located, such as in a network computing environment, the time required to complete the query is increased by the time required for the communication over the network to occur between the client machine and the database server (including the possibility of communications over intermediate connections between the client and database server). Thus, it can be seen that issuing a database query is an expensive operation in terms of elapsed time.
When a client machine and database server are connected in a local-area network (LAN) environment, it has been demonstrated that the amount of data sent from the server to the client in response to a database query has relatively limited influence on the overall processing cost of data retrieval. Instead, the access operation itself accounts for the majority of the processing time and thus forms the processing bottleneck. When the client and server are connected in a wide-area network (WAN), the amount of data transmitted does influence the data retrieval cost, but the access operation continues to account for a significant portion of the cost. In both environments, the overall efficiency of the system can be increased by retrieving as much of the working set as possible during each retrieval operation, with a larger efficiency gain being realized in the LAN environment. This is where the read-ahead operation comes into play: if a database retrieval is required for one object that an application requires access to, it is more efficient to retrieve additional objects at the same timexe2x80x94assuming, of course, that the objects retrieved in the read-ahead are those that will actually be used by the application in its subsequent operations.
When used together, read-ahead and caching techniques can dramatically improve the efficiency and flexibility of an executing application. The read-ahead operation retrieves data in advance of when the application is ready to access it, and caching stores the retrieved data in a location from which it can be quickly accessed when it is needed. In an application that does not use read-ahead and caching, the application is always starved for data, reading one object at a time from the data source as further data is needed. When the underlying object model of the application has many associations from one class to another (and therefore many relationships between tables in the database), traversing this model""s associations as the application user navigates the model to perform various tasks will typically require access to many objects. When each object is retrieved from the database one at a time, a large number of expensive database round trips will likely be required. This may lead to processing delays that are unacceptable to the application user.
A read-ahead scheme allows the application to minimize the number of database round trips, and therefore reduce the processing delays in the application. How far to read ahead, and which objects to retrieve during a read-ahead operation, is determined by the requirements of a particular application and is the subject of ongoing research in the industry. The application developer will code the database query commands to retrieve the data that he expects will most likely be needed next as the application executes. A single query command can be used to retrieve large object composition trees (i.e. multiple objects, having interrelationships that form a hierarchically-structured tree).
Reading ahead may result in too much data. Suppose, for example, that the application user begins a task for which the database query retrieves a particular employee""s information (from the Employee table) as well as information about the department he works in (from the Department table) and the project he works on (from the Project table). If the user ends up working with only one Employee, perhaps to change the employee""s marital status, then only a small percentage of the data from the read-ahead is actually used by the task. Reading ahead requires storage space to store the retrieved data and processing cost of transforming the data from its relational database format (i.e. a row from a table) into an object formatxe2x80x94that is, instantiating the retrieved data. In some situations, the data retrieved for a single object may result in multiple objects when instantiation occurs. Especially in component-based systems such as Enterprise JavaBeans, the ratio of instantiated objects to retrieved objects can be as high as 5 to 1, due to all the required helper objects that will be created. (xe2x80x9cJavaBeansxe2x80x9d is a trademark of Sun Microsystems, Inc.) When the data that is read ahead of the application""s requirement for it is in fact used by the application, then these storage and instantiation costs are justified. If, however, the read-ahead prediction was not totally accurate (as is very likely), then the cost of storing and instantiating the extra data (i.e. the data that was retrieved but not actually required by the application) is wasted. As will be obvious, as the percentage of storage and instantiation costs attributable to this extra data increases, there is a corresponding decrease in the efficiency gain that can be achieved by reading ahead. At the same time, it is generally desirable to attempt to read as much data as possible with each query command, in order to reduce the number of roundtrips to the database that will be required.
In addition to the problem of reading too much data, reading ahead may retrieve data that is needed by a task, but in a form that prevents the task from accessing it. This occurs when a database query retrieves data according to a particular defined access path, but the user wants to navigate the data in a different order. For example, suppose the query retrieves data beginning from the Department table, which is linked to the Employee table, which is further linked to the Project table (such as retrieving all employees within a department, and the project each works on). The user, on the other hand, wishes to navigate this data beginning with information from the Project table (locating all the employees who work on a particular project, perhaps). When navigation of the working set is restricted to using the order defined in the query, any alternative navigation paths that the user wishes to use will require additional roundtrips to the database to retrieve the data in a different order that is appropriate for the alternative navigation path.
Accordingly, what is needed is a technique whereby the costs involved with read-ahead data retrieval can be minimized, enabling the cost of storing and processing unused data to also be minimized, and whereby the retrieval order used for a database access will not limit the navigation order of the corresponding objects. The present invention addresses this problem by providing a technique for creating and using a structured cache to increase the efficiency of reading persistent objects from a database.
An object of the present invention is to provide a technique for creating a structured cache to store data retrieved using read-ahead operations.
Another object of the present invention is to provide this technique where the source of the retrieved data is a relational database, and where the destination of the data is an application written using an object-oriented programming language.
A further object of the present invention is to provide this structured cache technique such that data can be efficiently retrieved therefrom when needed.
Yet another object of the present invention is to provide this technique in a manner that does not require cache consistency maintenance procedures.
Still another object of the present invention is to provide a technique that enables object navigation to be independent of the data access path used on the query that retrieves the data.
A further object of the present invention is to provide a technique that maintains the information about relationships between entities that are retrieved from the database.
Yet another object of the present invention is to provide a technique that stores cached entries in an optimized form, reducing storage space and resource requirements.
Still another object of the present invention is to provide a technique that restructures a result set of a database query according to a corresponding object model.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a method, system, and computer-readable code for increasing efficiency of reading persistent objects from a database by creating and using a structured cache in a computer system. In one aspect, this technique comprises: retrieving a result set from said database in response to a database query, said result set comprising one or more rows of data elements; creating a data cache from said retrieved result set; and creating an associations cache from said retrieved result set.
In another aspect, this technique comprises: retrieving a result set from said database in response to a database query, said result set comprising one or more rows of data elements; creating a data cache from said retrieved result set; and responding to a request for access to an object by an executing program in said computer system, comprising: locating an entry corresponding to said object in said data cache; instantiating and hydrating said object from said located entry; registering said instantiated and hydrated object in an object cache; and returning said instantiated and hydrated object to said executing program. This aspect may further comprise creating an associations cache from said retrieved result set. The responding to a request for access may operate only for an initial request for access to said object, where responding to subsequent requests for access to said object comprises: locating said requested object in said object cache using said register; and returning said located object to said executing program.
In yet another aspect, this technique comprises: retrieving a result set from said database in response to a database query, said result set comprising one or more rows of data elements; creating a data cache from said retrieved result set; and creating an object cache entry for an object corresponding to data stored in said data cache, comprising: locating an entry corresponding to said stored data in said data cache; instantiating and hydrating said object from said located entry; and registering said instantiated and hydrated object in said object cache by creating said object cache entry. This aspect may further comprise creating an associations cache from said retrieved result set. The retrieving a selected object from said structured cache may further comprise: searching said object cache for said selected object using a result of said registering; returning said selected object if searching said object cache locates said selected object; searching said data cache for said entry corresponding to said selected object if searching said object cache fails to locate said selected object; wherein said creating an object cache entry is invoked if searching said data cache locates said entry; and issuing a further database query if searching said data cache fails to locate said entry. When said associations cache was created, this technique may further comprise navigating associations represented by said result set using said associations cache, in response to a request for one or more member objects of a selected association, comprising: searching said associations cache to locate a member key collection corresponding to said selected association, said member key collection comprised of one or more member primary keys; retrieving said member key collection from said database by issuing a second database query if said searching fails to locate said member key collection in said associations cache; locating each member object associated with said member key collection in said associations cache if said searching locates said member key collection; and returning said located member objects as a response to said first request. Preferably, creating a data cache further comprises: extracting a primary key for each object in each row of said result set; extracting corresponding data for each object from said row if said primary key does not already exist in said data cache; and storing said extracted primary key and said extracted corresponding data as said entry corresponding to said object in said data cache. Access to said data cache preferably uses a data cache look-up table, wherein said data cache look-up table is comprised of a collection of said stored primary keys and, for each of said stored primary keys, a corresponding pointer to said stored extracted corresponding data. Creating an associations cache may further comprise: defining one or more association types for said associations cache; storing association cache entries corresponding to said defined association types; and populating said associations cache, further comprising: extracting zero or more foreign keys for each object in each row of said result set; storing each extracted foreign key as an owner key, said owner key being associated with a particular one of said stored association cache entries and being unique within said particular one; and storing a member primary key corresponding to each extracted foreign key, said member primary key being associated with a particular one of said stored owner keys. Access to said associations cache preferably uses an associations cache look-up table, wherein said associations cache look-up table is comprised of a collection of said stored association cache entries, said stored owner keys, and said stored member primary keys, organized according to said stored member primary keys within said stored owner keys within said stored association cache entries. Access to said object cache preferably uses an object cache look-up table, wherein said object cache look-up table is comprised of a collection of object identifiers, one of said object identifiers corresponding to each of said registered objects, wherein said object cache look-up table is organized according to said object identifiers.
In a further aspect, the present invention provides a method, system, and computer-readable code for enabling object navigation to be independent of a data access path used on a query that retrieves data from a database, comprising: creating an associations cache, wherein said associations cache comprises an entry for each association in said retrieved data; and retrieving one or more member objects of a selected one of said associations from said created associations cache.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.