1. Field of the Invention
This invention generally relates to database query processing techniques for correlating data values for multiple conditions specified by a database query. More specifically, the present invention relates to query processing techniques for managing the execution of queries that include a comparison between discrete value property measurements and continuous value property measurements.
2. Description of the Related Art
Databases are well known systems for storing, searching, and retrieving information stored in a computer. The most prevalent type of database used today is the relational database, which stores data using a set of tables that may be reorganized and accessed in a number of different ways. Users access information in relational databases using a relational database management system (DBMS).
Each table in a relational database includes a set of columns, typically specified by a name and a data type (e.g., integer, float, string, etc). The columns of a table are used to store individual data values. For example, in a table storing data about patients treated at a hospital, each patient might be referenced using a patient identification number stored in a “patient ID” column, other columns could include “first name,” “last name,” etc. Each row of such a table provides data about a particular patient. Tables that share at least one attribute in common are said to be related. Further, tables without a common attribute may be related through other tables that do share common attributes. A path between two tables is referred to as a join, and columns from tables related through a join may be combined to form a new (logical) table, returned as a set of query results.
Database queries specify which columns to retrieve data from, conditions (a.k.a. predicates) that must be satisfied for a particular data value to be included in a query result table, and how to correlate data values from different columns. Current relational databases require that queries are composed in complex query languages. One such query language is the Structured Query Language (SQL). However, other query languages are used. An SQL query is composed from one or more clauses set off by keywords. Well-known SQL keywords include, e.g., the SELECT, WHERE, FROM, HAVING, ORDER BY, and GROUP BY keywords. Composing a proper SQL query requires that a user understand both the structure (i.e., the tables and columns) defined for a particular relational database and the complex syntax of the SQL query language (or other query language). This complexity, however, generally makes it difficult for average users to compose a desired query.
Also, an often overlooked, yet fundamental, aspect of database data collection is that databases are often used to record a measured value obtained for a particular point in time, despite the fact that the measured value captures data for a continuous property (e.g., the ambient temperature is always some measurable quantity at a given moment in time). Typically, each entry in a database may have an associated timestamp indicating when the entry was obtained, or when the entry was added to the database. This is a useful feature in many cases (e.g., a user wants to determine exactly when a financial transaction posted to their bank account).
In other situations, this aspect makes it difficult to evaluate certain types of queries. For example, users often desire to correlate multiple query conditions with one another. Consider, for example, a database record used to store a weight value for a given patient. Even though the data value is associated with a particular point in time, a person's weight is, in most cases, relatively constant. That is, most individuals weigh about the same day-to-day regardless of whether a value is captured in a database.
To build on this example, assume the database is also used to record test data values obtained from medical tests. If a user desired to compose a query that identified patients with an elevated hemoglobin test result over 40 who also weighed over 220 pounds at the time of the test, then, when executed, a query needs to (i) identify patients with the elevated hemoglobin test, and (ii) determine whether a patient with a high hemoglobin value has a weight value over 220. Alternatively, the query may be processed by first identifying patients with a weight value over 220, and then determining whether there is a corresponding hemoglobin test over 40. Either way, if weight and hemoglobin data values associated with a specific patient are recorded with different timestamp values, then a database query engine may, incorrectly, fail to include such a patient in a query result table because it does not have data values for each condition with the timestamp. Or worse, a query engine might compare any two such values (e.g., a weight measurement from three years ago compared against a hemoglobin test from three days ago). Thus, without the ability to correlate conditions, executing a query with multiple conditions may both fail to include patients in query results that should be, or include patients that should not.
The inability of current systems to correlate this type of data can result from a number of different situations arising from how the data is captured into the database. For example, test results may take days to generate and may be recorded into a database based on time that the test is completed or when the test is first performed; a hospitalized patient might be weighed when admitted, but not on each day during a hospital stay; an individual may not undergo both tests at the same time (e.g., a patient may visit a clinic to have blood drawn for the hemoglobin test without contemporaneously being weighed). These examples illustrate a few of many similar situations where current databases are unable to correlate data for multiple conditions, often because of how data is represented using measurements that are linked to a specific point in time.
Accordingly, there remains a need for techniques to correlate data for different data items in a database, and for query processing techniques for managing the execution of queries that include a comparison between discrete value property measurements and continuous value property measurements.