1. Field of the Invention
The present invention relates to computer databases. More particularly, the present invention relates to techniques for creating a data abstraction model over of a set of individual databases that includes constraints on how logically related data sets are joined together and presented to a user.
2. Description of the Related Art
Databases are well known systems for information storage and retrieval. The most prevalent type of database used today is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A relational database management system (DBMS) uses relational techniques for storing and retrieving data.
A database schema describes the structure of a database. For example, a relational schema describes a set of tables, columns, and primary and foreign keys that define relationships between different tables. Applications are developed that query data according to the database schema. For example, relational databases are commonly accessed using a front-end query application that is configured to perform data access routines, including searching, sorting, and query composition routines. At the back-end, software programs control data storage and respond to requests (queries) sent by users interacting with the front-end.
One issue faced by data mining and database query applications, however, is their close relationship with a given database schema. This relationship makes it difficult to support an application as changes are made to the corresponding underlying database schema. Further, this tightly bound relationship inhibits the migration of a query application to alternative data representations.
Commonly assigned U.S. patent application Ser. No. 10/083,075 (the '075 application), filed Feb. 26, 2002, entitled “Improved Application Flexibility Through Database Schema and Query Abstraction,” discloses a framework that provides an abstract view of a physical data storage mechanism. The framework of the '075 application provides a requesting entity (i.e., an end-user or front-end application) with an abstract representation of data stored in an underlying physical storage mechanism, such as a relational database. In this way, the requesting entity is decoupled from the underlying physical data when accessing the underlying DBMS. Abstract queries based on the framework can be constructed without regard for the makeup of the physical data. Further, changes to the physical data schema do not also require a corresponding change in the front-end query application; rather, the abstraction provided by the framework can be modified to reflect the changes. Commonly assigned, U.S. patent application entitled “Abstract Query Plan”, Ser. No. 11/005,418, filed Dec. 6, 2004 discloses techniques for processing an abstract query that include generating an intermediate representation of an abstract query then used to generate a resolved query consistent with the underlying database.
Oftentimes, relationships exist between data elements that are not captured by the table structure of a relational database. For example, consider a set of tests that make up a test suite (e.g., a set of toxicity tests given to a patient brought to the emergency room). Although each test is independent of or distinct from the others, the multiple tests are related and collectively form a set. Another relationship not captured by a relational database may be independent events that together form a series. A series of events may be ordered based on the sequence of individual events included in the series. The events may be different, but may also be the same event type repeated multiple times. For example, many treatment regimens or research experiments may be conducted sequentially. In addition, researchers often wish to identify patterns present in data. For example, a researcher may wish to form a set: event “A,” event “B,” and event “C” to seek a correlation to outcome “X.” Similarly, a series (e.g., event “A,” then event “B,” and the event “C”) may be defined as a sequence of events used to identify a possible outcome.
Data from the tests may be stored in a single column of a test table with an additional column that indicates the test type. Table I, below, is an example of such a table. This tabular arrangement allows results from new tests to be added without requiring a structural change to the relational schema. To the average user, however, it is very surprising that test results are often not stored together as a result set in the database. Table II illustrates a tabular arrangement that users might expect in that Table II is consistent with the users' logical perspective of the physical data.
TABLE IExample Table - ActualIDResultTypeDateTest Run112Test 111/3/041145Test 211/4/0411203Test 311/5/04119Test 111/20/042147Test 211/21/0421198Test 311/22/042
TABLE IIExample Table ExpectedIDNameTest 1Test 2Test 31Dave12452031Dave947198
Arranging a relational table consistent with the users' logical view of these relationships (e.g., as in Table II) leads to inefficient or un-maintainable database design. A new table would need to be added for each new test or test regimens. Presenting the tests as they are stored in Table I, however, makes it difficult for users to interpret data.
Accordingly, there remains a need to extend the capabilities of an abstract database to account for the logical relationships between logical fields that may not be reflected by the underlying physical database schema, including set-type relationships and series-type relationships.