Relational databases store information in tables, each of which has one or more columns and zero or more rows. Relationships may be defined between two or more tables to reflect real-world relationships between what is represented by the data contained in those tables. Database systems are very good at responding to queries that are written in terms of the relational model in which they store data.
Frequently, the data stored in a database corresponds to real world items. For example, a database may store information about the purchase orders received by a company. The real world items that are represented by the relational data are often constrained by limitations inherent in the real world domain of those items. For example, it may be company policy that P.O. boxes cannot be used as the mailing address for purchase orders. Rules that apply to data items, such as the xe2x80x9cno P.O. box mailing addressxe2x80x9d rule, are generally referred to as xe2x80x9cdomain logicxe2x80x9d.
Ideally, the database applications used to store information in a database implement the domain logic that applies to that information. For example, the database application used to enter purchase orders into a database should check the mailing address specified for each purchase order, and reject the address if it specifies a P.O. box.
Frequently, database application designers tie the domain logic that applies to a data item to the user interface component that is used to present the data item to a user. For example, the database application may present the user with a user interface that includes a user interface component (e.g. a text field) for receiving a mailing address. The application designer associates with that text field domain logic that checks to determine whether the text within the UI component represents a P.O. box. Using a database application design tool, the association of the xe2x80x9cno P.O. boxxe2x80x9d domain logic with the text field may be accomplished by (1) selecting a user interface control associated with the text field, where the user interface control allows entry of validation logic for the text field, and (2) entering validation logic that verifies that the text in the text field does not specify a P.O. box.
Unfortunately, the domain logic thus created is tied to that particular UI component in the user interface, not to the mailing address information itself. Thus, if the application presents the mailing address information in another UI component of a second, different user interface, the xe2x80x9cno P.O. boxxe2x80x9d domain logic would have to be repeated and tied to that other UI component as well. This would involve, once again, entry of validation logic that verifies that the text in the text field does not specify a P.O. box.
Tying the domain logic to user interface components tends to produce applications where the domain logic of the application is xe2x80x9cscatteredxe2x80x9d among the various user interface components. The decentralized nature of the domain logic creates severe problems when, for example, the domain logic for a given type of data item must be modified. For example, if the company decided that mailing addresses can be P.O. boxes if the P.O. boxes are within a particular state, then the application programmer would have to make the appropriate revisions to the validation logic of each user interface component that allows entry of mailing addresses.
Object oriented programming provides certain benefits that the relational model does not. Object oriented programming allows programmers to model real-world things, such as purchase orders, in a way that they find intuitive. It also allows the domain logic that applies to those real-world things to be xe2x80x9cencapsulatedxe2x80x9d, and the code that implements that logic to be re-usable. To achieve these benefits, applications that interact with relational databases are often written so that data is modeled using objects, even though the data for those objects is actually stored in a relational database.
Unfortunately, applications that model data using an object model frequently do so at the cost of sacrificing the efficiency of the relational model. For example, in the relational model, if a user only wants to see data from a particular column, data is only retrieved from that column. In an object model, if a user only wants to see data from a particular attribute of a particular object, the entire object may have to be instantiated before the one attribute value is supplied to the user. At the database level, instantiating the object may involve performing complex joins between multiple tables, and retrieving significantly more data than the user actually wants to see.
The following scenario illustrates the problem of systems that include components that must work together, but which model the same data differently. Specifically, a system for creating purchase orders will be described in which a database server that stores the data persistently uses a relational or xe2x80x9cdata modelxe2x80x9d, the JAVA runtime classes that manipulate the data use an object model, and multiple user interfaces, representing distinct end-user applications, present the data to users.
The Data Model
Referring to FIG. 1, it illustrates the data model for a Purchase Order that could be used by a relational database server. A Purchase Order is a contractual document that includes information describing the buyer, the seller, the goods and services being procured, delivery information and internal accounting.
In the schema shown in FIG. 1, the purchase order (PO) is modeled as a Header which has one or more Lines. Each Line has one or more Shipments, and each shipment has one or more Distributions.
The Header (PO_HEADERS table) stores information pertaining to the entire document (e.g. Buyer, Document Total, Approval Status, Supplier and so on). A purchase order has one and only one header.
The Line (PO_LINES table) stores information about what the buyer wants to order (e.g. Item Number and Description, Unit of Measure, Price, Order Quantity).
A purchase order must have at least one line, and may have more than one if the buyer wants to order different goods and services from the same supplier.
The Shipment (PO_SHIPMENTS table) stores information about where and when the order is to be delivered (e.g. Ship-To Location, Due Date, Shipment Quantity).
A line must have at least one shipment and may have more than one if the buyer wants a single order quantity to be shipped to multiple receiving docks, or shipped to the same dock on several different dates. If a line has greater than 1 shipment, the total quantity of all shipments must equal the line quantity.
The Distribution (DISTRIBUTIONS table) stores information pertaining to the internal accounting for each shipment (e.g. Cost Center, Distribution Quantity).
Each shipment must have at least one distribution and may have more than one if the buyer needs to allocate procurement costs to multiple accounts. If a shipment has greater than 1 distribution, the total quantity of all distributions must equal the shipment quantity.
In accordance with standard relational design, each of the Purchase Order tables store IDs (foreign keys) for referenced entities. In this example, the Purchase Order Header table (P_HEADERS) stores a buyer ID and a supplier ID. The Line table (PO_LINES) stores an item ID. To obtain the Buyer Name, Supplier Name, Item Number and Item Description these foreign keys must be resolved using a join to their corresponding primary key in the referenced tables (BUYERS, SUPPLIERS and ITEMS respectively).
Finally, within the Purchase Order itself, each child entity stores the unique identifier of its parent entity.
The Object Model
While the data model is optimized for quick data storage and retrieval, the object model is optimized for effective domain logic implementation at runtime and code reuse. Fundamental to the object model is the concept of domain objects. Domain objects are objects that represent items of significance in the real world that can be modeled and implemented in software. Domain objects can exist in any domainxe2x80x94business, scientific, natural, etc. Domain objects encapsulate the application behavior or xe2x80x9cdomain logicxe2x80x9d that is relevant for the domain of the items they represent.
For example, complicated domain logic may be encapsulated within domain objects that are implemented in reusable bundles of Java classes. FIG. 2 is a block diagram that illustrates an object model, implemented in Java classes, for four Domain objects: Purchase Order, Supplier, Buyer and Item. Note that domain objects can include calculated values that are not stored in the database.
To illustrate the benefit of encapsulated logic, assume that a company has a policy against shipping to P.O. box addresses. In the object model illustrated in FIG. 2, the ship-to address corresponds to the shipToLocation attribute of the POShipment object. Every time a shipToLocation attribute is retrieved, it is retrieved by calling the getShipToLocation() method of the POShipment object. Every time a shipToLocation attribute is stored, it is stored by calling the setShipToLocation( ) method of the POShipment object. Thus, the xe2x80x9cno P.O. boxxe2x80x9d rule can be implemented by adding logic to a single location: the setShipToLocation( ) method. This logic will be executed regardless of the user interface UI components through which a user attempts to set the ship-to address, since the logic is tied to the shipToLocation attribute and not to the user-interface UI components themselves.
Consequently, if the company decides to accept ship-to P.O. boxes for a particular state, the company may do so simply by modifying the logic of the setShipToLocation( ) method. The modifications will take effect everywhere that the ship-to address attribute is manipulated because the logic is tied to the attribute, not the user interface.
The User Interface
The data stored using the relational model of FIG. 1 and retrieved using the object classes of FIG. 2 may actually be presented to users using the interfaces shown in FIG. 3. For example, assume that a Company X has installed an application which includes a Purchase Order module that generates user interface 302. Since user interface 302 displays every possible data value for a POxe2x80x94and the business process of Company X requires only a small subset of those values while defaulting many of the valuesxe2x80x94Company X has built a custom Purchase Orders module that generates interface 300, which lets Company X display just a few relevant UI components in a spreadsheet-like format for speedy data entry and review.
The business rules governing the data validation are the same for both modules, and the underlying data model provided by the database server is the same. The only thing that is different is the xe2x80x9cshapexe2x80x9d and content of the data as presented to the user.
Not only does Company X want to display fewer data values to the user, they never create multiple shipments or distributions, so there is no reason to complicate the data entry process by displaying this information in a hierarchical user interface. In fact, it is not even necessary for the buyers in Company X to know that the underlying object and data models have the concept of shipments and distributions within their internal architecture.
The problem associated with storing data using the relational model illustrated in FIG. 1, and manipulating it using the object model illustrated in FIG. 2, and presenting it using the interface illustrated in FIG. 3, is illustrated by the following scenario.
When a software engineer in Company X""s IT group is assigned the task of building the custom user interface 300, the software engineer may realize that this is the first of what is likely to be several requests for custom PO applications. For example, the software engineer may have already heard from someone who wants an HTML front end for key suppliers to create their own POs based on weekly planning data.
The software engineer also does not want to recreate the domain logic coded in the Purchase Order domain object since that would not add any value to the custom application, but will introduce problems when the company upgrades to the next release of the Purchase Order domain object.
The problem that presents itself is how to easily reuse the Domain objects (while preserving encapsulation) and the underlying data model when neither matches the xe2x80x9cshapexe2x80x9d of the data as perceived by the user through different user interfaces. When presented with this problem, the object-oriented approach would dictate that data should only be accessed through a class""s public get() methods, and therefore user interface components that expect to be mapped to a datasource should be mapped directly to the domain objects that own the data.
Unfortunately, the pure object-oriented strategy has several drawbacks. For example, the xe2x80x9cshapexe2x80x9d of the object model does not necessarily mirror the underlying data model, and neither of these has to match the user interface. Direct dependence on the Domain object classes to serve up data make it difficult to tailor performance for different user interfaces. The temptation is to build different domain objects for each user interface, which defeats the goal of code reuse.
Also, using the pure object-oriented strategy, there are performance penalties that are caused by fully populating a domain object (plus references) every time the user wants to see a data value. In the example given in FIG. 1, each of the objects has only a few values. However, in real world scenarios, objects routinely have hundreds of attributes, the attribute values are stored in tables that have millions of rows, and the objects have 25+ references to other objects that are equally complex. Under these conditions, it is difficult to efficiently handle queries when the user is going to be driving the query with referenced data values, as in the query xe2x80x9cfind all POs for the Supplier AMPxe2x80x9d.
Further, if the domain objects are instantiated on the client, data for all of the non-requested attributes will have to be sent from the server to the client. The percentage of communication bandwidth consumed by the transmission of data for attributes that are not even needed by the user interface may vastly exceed the bandwidth consumed by the transmission of the requested data. Thus, instantiating client-side domain objects is particularly inefficient in environments where the communication channel between the client and server represents a bottleneck.
Based on the forgoing, it is clearly desirable to provide a mechanism that allows the domain logic associated with data stored in a database to be encapsulated, but which does not incur the performance penalties associated with a pure object-oriented approach.
According to one aspect of the invention, a mechanism is provided that achieves the benefits of both the relational model and the object oriented model. Specifically, data is supplied to users through xe2x80x9cquery objectsxe2x80x9d, where the query objects create a bridge between the information the user wants to see (as expressed in the user interface presented to the user) and the relational model. Thus, data used to populate the user interface is queried directly from the data model and returned in the shape required by the user interface. As the query results are read into memory, objects that model the data using object-oriented techniques are instantiated. Subsequent manipulation of the data is redirected to those objects, thereby preserving data encapsulation and allowing reuse of existing logic.
According to another aspect of the invention, the objects associated with the data that is retrieved through a query object are lazily populated. For example, when instantiating a particular object that has thousands of attributes, the values for all of the attributes are not automatically retrieved. Rather, those values that are already being retrieved in response to the query are used to populate their corresponding attributes, while other attributes of the object are only populated on an as-needed basis. In a preferred embodiment, all of the attributes for the object are retrieved as soon as a method needs a value that was not populated when the object was instantiated.