1. Field of the Invention
The present invention generally relates to data processing and more particularly to accessing data using correlation criterion or criteria.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
When constructing complex queries against information stored in a database or data warehouse, it is often desirable to correlate the data entities being queried based on a common attribute. For example, data entities may be correlated based on all entities or events that: (i) occurred at a given point in time or within a given time span; (ii) occurred at the same time or within a given time interval of each other; (iii) occurred at the same location which may be scoped a number of different ways (e.g., same street address, city, county, state, postal code, or country); (iv) occurred when an individual or item in question was a given age or within a given age range; and (v) occurred when the age associated with all data entities in question was the same.
The typical approach to constructing such complex queries involves augmenting the query with additional predicates to factor in the logic for the common correlation attribute(s) for each data entity being queried. Unfortunately, this approach requires a deeper understanding of the physical relationship of the data involved and of the particular data entities being queried (e.g., whether the data entity supports the concept of time, age or location). By way of example, consider a user interested in identifying males currently 60 years of age or older who were found to have a liver enzyme level above a certain threshold and who underwent surgery to correct the liver abnormality. In SQL, a corresponding query may take the form of Query 1.
SQL QUERY 1SELECT ASTValue, Diagnostics, SurgeryType FROM Demographics, TestResults, SurgeryRecord WHERE Gender =”M” AND YEAR (CURRENT_DATE − BirthDate) >= 60 ANDASTValue > 30 AND SurgeryType = “Liver”
Now assume the user desires to narrow the data returned to only include those cases where the person was tested and had the surgery within thirty days. This could be accomplished by extending the SQL Query 1 with additional predicates to select test and surgery events that occurred within the same time span, as illustrated by Query 2.
SQL QUERY 2SELECT ASTValue, Diagnostics, SurgeryType FROM Demographics, TestResults, SurgeryRecord WHERE Gender =”M” AND YEAR (CURRENT_DATE − BirthDate) >= 60 ANDASTValue > 30 AND SurgeryType = “Liver” AND DAYS (SurgeryDate − TestDate) <= 30
In another case, the user may desire to identify individuals with the same combination of criteria, but limiting the results to those people who were between the ages of 40 and 50 when they had the test and surgery. This would require a different set of predicates being added to the base query, as illustrated by Query 3.
SQL QUERY 3SELECT ASTValue, Diagnostics, SurgeryType FROM Demographics, TestResults, SurgeryRecord WHERE Gender =”M” AND YEAR (CURRENT_DATE − BirthDate) >= 60 ANDASTValue > 30 AND SurgeryType = “Liver” AND YEAR (SurgeryDate − BirthDate) >= 40 AND YEAR(SurgeryDate − BirthDate) <= 50 AND YEAR(TestDate − BirthDate) > = 40 AND YEAR(TestDate − BirthDate) <= 50
The foregoing examples illustrate that, while the base query remains the same in each case, additional predicates are added to accomplish the desired correlation between entities examined by the query. As a result, the burden on the end-user to access the desired data is substantial.
Therefore, there is a need for a manner of implementing correlation logic within queries.