1. Field of the Invention
The invention generally relates to computer database systems. More particularly, the invention relates to techniques for processing abstract rules with query results having rows with multiple data values per column.
2. Description of the Related Art
Databases are well known systems for storing, searching, and retrieving information stored in a computer. The most prevalent type of database used today is the relational database, which stores data using a set of tables that may be reorganized and accessed in a number of different ways. Users access information in relational databases using a relational database management system (DBMS).
Each table in a relational database includes a set of one or more columns. Each column typically specifies a name and a data type (e.g., integer, float, string, etc), and may be used to store a common element of data. For example, in a table storing data about patients treated at a hospital, each patient might be referenced using a patient identification number stored in a “patient ID” column. Reading across the rows of such a table would provide data about a particular patient. Tables that share at least one attribute in common are said to be “related.” Further, tables without a common attribute may be related through other tables that do share common attributes. A path between two tables is often referred to as a “join,” and columns from tables related through a join may be combined to from a new table returned as a set of query results.
Queries of a relational database may specify which columns to retrieve data from, how to join the columns together, and conditions (predicates) that must be satisfied for a particular data item to be included in a query result table. Current relational databases require that queries be composed in complex query languages. Today, the most widely used query language is Structured Query Language (SQL). However, other query languages are also used. A SQL query is composed from one or more clauses set off by a keyword. Well-known SQL keywords include the SELECT, WHERE, FROM, HAVING, ORDER BY, and GROUP BY keywords.
Typically, composing a proper SQL query requires that a user understand both the structure and content of the relational database as well as the complex syntax of the SQL query language (or other query language). The complexity of constructing an SQL statement, however, generally makes it difficult for average users to compose queries of a relational database. Because of this complexity, users often turn to database query applications to assist them in composing queries of a database. One technique for managing the complexity of a relational database, and the SQL query language, is to use database abstraction techniques. Commonly assigned U.S. Pat. No. 6,996,558, entitled “Application Portability and Extensibility through Database Schema and Query Abstraction,” discloses techniques for constructing a database abstraction model over an underlying physical database.
U.S. Pat. No. 6,996,558 discloses embodiments of a database abstraction model constructed from logical fields that map to data stored in the underlying physical database. Each logical field defines an access method that specifies a location (i.e., a table and column) in the underlying database from which to retrieve data. Users compose an abstract query by selecting logical fields and specifying conditions. The operators available for composing conditions in an abstract query generally include the same operators available in SQL (e.g., comparison operators such as =, >, <, >=, and, <=, and logical operators such as AND, OR, and NOT). Data is retrieved from the physical database by generating a resolved query (e.g., an SQL statement) from the abstract query. Because the database abstraction model is tied to neither the syntax nor the semantics of the physical database, additional capabilities may be provided by the database abstraction model without having to modify the underlying database. Thus, the database abstraction model provides a platform for additional enhancements that allow users to compose meaningful queries easily, without having to disturb existing database installations.
In some situations, the results of database queries can take the form of Cartesian products, meaning that the results include the various combinations of the query attribute values, and may have the same attribute values duplicated in multiple rows. Such query results may make analysis of the data difficult. For example, a hospital database may store results of medical tests administered to patients. A query of the hospital database may result in a Cartesian product, and thus may return multiple rows of query results for each patient. A medical researcher seeking to evaluate such query results may find it difficult to interpret patient data that is spread out over many rows.
One approach to making such query results easier to use is to generate them with an entity-based format. That is, the query results are grouped for a given attribute field, or “model entity,” and are combined so grouped results are presented in a single row, with each column including all values for that attribute and for that model entity. For example, a hospital may store the results of medical tests performed on patients in a table of an abstract database, with each column of the table representing a different type of test. A query of the abstract database may be composed to use the patients as entities, and to produce query results with an entity-based format. If so, each row of the query results would represent a single patient, and each column within that row would include the results of all instances of a particular type of test that have been administered to that patient. Thus, since each patient may have taken the same type of test on a different number of instances, a given column may include a different number of values in each row. The use of an entity-based output format can produce query results that are easier to read, since they include a single row for each model entity. Commonly assigned, co-pending U.S. patent application Ser. No. 10/403,356, filed Mar. 31, 2003, titled “Dealing with Composite Data through Data Model Entities,” discloses techniques for using model entities in database queries.
In some situations, data that is collected and stored in a database can be used as input to analysis routines for various purposes, including know-how management, decision making and statistical analysis. For instance, in a broad variety of applications, analysis routines are executed on query results obtained by executing corresponding queries against an underlying database.
Analysis routines can be defined by rule sets including one or more rules, each having predicates and actions. Commonly, the rules will have the structure “IF [predicate] THEN [action].” A rule predicate is a conditional statement evaluated in a rule engine. If the predicate is satisfied (i.e., the condition is met), then the associated rule action is executed. In other words, a set of rules can be used to implement an analysis routine, and a rule engine can evaluate predicates and fire or execute actions defined in the rules. Where actions of rules are defined to provide recommendations for users, such as treatment recommendations for doctors in medical institutions, the rules can be defined such that corresponding predicates reflect expert-based knowledge of possible diagnoses and evaluations of patient conditions. In other words, rules can be implemented to assist doctors by making diagnosis recommendations, drug recommendations, providing reminders of required verifications and checks, etc.
However, the creation of rules is generally a complex and difficult process which requires detailed knowledge of a corresponding database(s). More specifically, for each predicate and each action of the given rule that the user wants to create, the user requires an understanding of the database schema in order to look up a corresponding column name in the underlying database table(s). One technique for managing the creation of rules is to use abstract rule sets. Commonly assigned U.S. application Ser. No. 11/272,583 (hereafter “the '583 application”), entitled “Abstract Rule Sets,” discloses techniques for using abstract rule sets.
Abstract rules are composed by referencing logical fields of a database abstraction model, and thus do not require a user to understand the schema of the physical database. To be used, abstract rules must be translated into an executable form that can be processed by a rule engine. In addition, as with the abstract queries, the abstract rules must be resolved to the physical database. Typically, abstract rules are composed by a “rule makers,” based on their expertise and on previously-collected data. The abstract rules may then be provided to “rule users,” who may execute the abstract rules by using query results as inputs.
However, in the situation where query results are produced with an entity-based format, they may not be suited for use as inputs to abstract rules. As described above, query results with an entity-based format may include rows that have multiple values stored in a single column (i.e., attribute). Additionally, the number of such values can vary from one row to the next. Since abstract rules are composed to reference a fixed set of logical fields, they typically cannot use query results having an entity-based format as input.
Therefore, there is a need for techniques for processing abstract rules with query results having an entity-based format.