1. Field of the Invention
The invention relates to the field of computer systems. More specifically, the invention relates to object technology.
2. Background Information
Most multi-user systems are now built on top of special programs, called database management systems, that are designed to manage simultaneous access to shared data. Database management programs do more than just control access to data stored in files; they also store relationships among the various data elements. The way these relationships are stored is critical to understanding the generations of databases that have come and gone in corporate environments. Early database management systems adopted the approach of storing direct references among data elements, allowing data to be retrieved through a process called navigational access. This approach supports fast retrieval of related information because each piece of stored information includes the effective locations of all relayed information.
One early database modeling technique, known as the hierarchical model, represents data items, called records, in tree structures. For example, a department can include records for the positions it contains and the equipment checked out to it. Each position, in turn, can be associated with a list of responsibilities and a list of employees in the department holding that position. A subsequent extension of the hierarchical model, the network model, allows data to be interconnected freely, with no requirement that it fit into a simple tree structure.
The hierarchical and network database models make it easy to represent simple relationships among data elements and provides fast access to the data. But there is a cost: Accessing the data in a way other than the one supported by the predefined relationships is slow and inefficient. Worse yet, the data structures are hard to track, maintain, and modify.
A newer database modeling technique, the relational model, addresses these problems by separating the data from the information about complex relationships. Particularly, all data is stored in simple tables, with basic relationships among data items being expressed as references to values in other tables. For example, each entry in an equipment table can contain a value indicating which department it belonged to. This approach to linking data in a database management system is called associative access because it relies on content rather than locations to link data elements together. Along with a new model for accessing data, the relational database management system also makes possible a new syntax, the Structured Query Language (SQL), that allows information in an relational database management system to be accessed in all possible locations.
Although the relational model is much more flexible than its predecessors, it extracts a steep price for the flexibility. The information about complex relationships that was separated from the database must be expressed as procedures in every program that accesses the database. This separation is a clear violation of the independence required for software modularity. The shift from navigational to associative access also extracts a serious performance penalty for complex data because it takes much longer to locate related information by searching for content rather than jumping to calculated addresses.
Despite these drawbacks, the flexibility of relational databases has proliferated through the industry. Although vast quantities of data continues to be managed in network databases, hierarchical databases, and unstructured files, these are the legacy of previous generations rather than reflections of current practice. Relational databases represent the dominant database management system technology today, and most new applications are constructed using relational systems.
Of course, having data scattered across several generations of database management system technology presents its own set of problems. It is very hard for individual applications to integrate data across such radically different technologies, which means that getting any kind of management overview is still difficult. One approach to solving this problem is a relational database that is used to pull together selected data from many different storage locations into a single repository with a common access mechanism (referred to as a relational data warehouse). The relational data warehouse does not, in itself, solve the problem of integrating data from divergent sources. But it does allow that problem to be solved only once, using custom procedures to pull the data together. Once this has been done, all the combined data can be accessed in a uniform manner, using SQL and other standard tools.
FIG. 1 is a block diagram illustrating the use of a relational data warehouse to integrate disparate sources, including data sources (a relational database and CORBA) and applications (a legacy application and a conversion application). In FIG. 1, the relational data warehouse uses the relational database tables and stored procedures to store relationships within and between the disparate integration sources.
The relational data warehouse is a reasonable step toward data integration, but it's a solution that introduces problems of its own. For example, the need to express relationships as separate procedures in every program that accesses the relational database has caused problems. In particular, different procedures have been added overtime by different programmers; at different times; using different programming languages; for different applications and integration sources; to express different relationships; etc. In addition, as these procedures are being added to the system, there is no mechanism or infrastructure to provide and/or force commonality between these procedures. Since an adaptive integrated view requires the ability to collect and execute different combinations of these procedures, such a view cannot be provided.
For example, a first programmer wishing to express a first relationship between the relational database and the legacy application of FIG. 1 writes a first procedure. However, a second programmer wishing to express a second relationship between the legacy application and the conversion application of FIG. 1 writes a second procedure (perhaps in a second language) having a different format (e.g., requiring a different number and/or type of inputs, providing a different number and/or type of outputs, etc.). In addition, a third programmer wishing to express a relationship between various items in a given integration source writes yet a third procedure that does not necessarily have a format common to the first and second procedures. As this process continues, a morass of procedures expressing different relationships between different items and/or different integration sources and having different formats is created.
The lack of commonality between these procedures makes them difficult, if not impossible, to track, maintain, and/or reuse. Thus, there is no reasonable mechanism by which a user can locate, much less execute, all procedures across the enterprise that express a certain type of relationship. For example, a user interested in a particular type of relationship expressed by the first and third procedures above would need know of both procedures, be able to locate both procedures, and know the format of both procedures. Thus, this inability to locate and execute different combinations of the procedures across the interface therefore prevents the provisions of adaptive integrated views.
In addition, this morass of procedures 1) limits adaptability; 2) provide little support for reusability; 3) leads to duplication of functionality; and 4) results in an inability to predict the result of changing an integration source. Specifically, a programmer unfamiliar with the work of another programmer (see reasons given above) will end up writing a procedure very similar to one that already exists. Thus, a request for the expression of a given relationship results in new procedures being added, except in the rare cases when an existing procedure is identified and the programmer can verify the procedure can be changed without disrupting other parts of the system.
Additionally, since the procedures interface with the integration sources and/or other procedures, changing a given integration source or procedure can affect any number of other integration sources and/or procedures. However, it often cannot be determined what procedures and/or other integration sources will be affected by such changes. For example, a first stored procedure may generate a report from a first integration source. This first procedure may use a second procedure when generating that report (e.g., the second procedure may be used for converting date information from a given format to the Julian format). Thus, the first procedure depends on the first integration source and the second procedure. Although this dependency exists, there is no mechanism for readily exposing these dependencies. As a result, programmers are reluctant to make any changes, but instead attempt to extend integration sources and/or write new procedures.