Field of the Invention
Embodiments of the present invention generally relate to data processing. More specifically, embodiments of the invention relate to a method for automatically determining Boolean logic and operator precedence of query conditions for users composing a database query.
Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Database queries are composed using a query language. Currently, the most commonly used database query language is SQL, short for Structured Query Language. The term “query” refers to a set of commands for retrieving data from a stored database. A database query provides a specific set of instructions for extracting particular data from a database. Typically, a query specifies conditions that data elements in the database should satisfy in order to be returned as part of a query result. Groups of conditions are evaluated using logical operators (e.g., OR, AND, etc.) according to a set of rules defining operator precedence. Precedence is a property of an operator that is used to specify its order of evaluation relative to other operators included within the database query. Operators with higher precedence are evaluated before those with lower precedence. Precedence rules may be implicitly used to evaluate an unstructured query. Alternatively, an order of evaluation may be may be expressly specified by a database query, typically by enclosing a condition within parentheses.
Constructing a query in a query language such as SQL typically requires at least some level of technical expertise. As such, queries are usually composed by technically proficient persons, such as programmers. However, in certain situations, it may be useful to enable a non-technical user (i.e., not a programmer or database administrator) to compose and execute database queries. For example, a customer service representative may need to query a customer database to retrieve information about customers that live in a given city. In another example, a company web site may allow consumers to interactively query a product catalog. In such situations, the users have a need to query the database, but may lack the expertise required to compose a database query correctly.
To enable a non-technical user to compose a database query, application programs often provide users with a simplified query interface. Frequently, such an interface allows a non-technical user to compose a query by specifying one or more query conditions. A query condition includes a database field (also referred to as an attribute) and a value for that attribute used to include (or exclude) a data record from query results. The database attribute may specify a column in a table of a relational database. For example, records in a customer database could include the data attributes of “First name,” “Last name,” “Address,” “Phone number,” etc. Each record (i.e., row) in this table stores information related to a different customer and may specify a value for one or more attributes of the table. A non-technical user may use a query interface to specify a query condition having attribute “Phone number” and condition value “555-1234.” Executing this query would return only those database records having the value “555-1234” in the “Phone number” field. This query can be represented as:
Select all where:(Phone number = “555-1234”)
In some situations, a user may have a need to perform a query with more than one query condition. For example, a user may need to determine the customers that have first name “John” and last name “Doe.” In such situations, query interfaces have allowed users to specify multiple query conditions. Typically, such query interfaces force the user to select between the options of “All conditions” or “Any conditions.” If the user chooses “All conditions,” the query will be constructed so that each record of the resulting data set has to individually meet every specified condition. In this example, selecting “All conditions” would result in a data set comprising the customers having both a first name “John” and a last name “Doe.” Thus, customers in the resulting data set will be named “John Doe.” In contrast, if the user selects “Any conditions,” the query will return any records that meet one or more query conditions. Thus, if the user in the previous example had selected “Any conditions,” the resulting data set will comprise any customers having first name “John” along with any customers having last name “Doe.” In this case, the result could include customers named “John Doe,” “John Smith”, “Jane Doe”, “John Jones”, “Richard Doe,” etc.
However, in some situations, a user may wish to compose a query with multiple query conditions in more complex combinations than are allowed using the “All conditions” or “Any conditions” options. As is known, a common technique for constructing complex queries is Boolean logic. In Boolean logic, the various query conditions are joined by logical operators (e.g., AND, OR, etc.). Query interfaces are known in the art that permit a user to construct a query by selecting a first query condition (e.g., First name=“John”), then selecting a logical operator (e.g., AND), and then selecting a second query condition (e.g., Last name=“Doe”). This process can then be repeated to achieve the desired query.
For example, Boolean logical operators can be illustrated in a case in which a user wishes to compose a query to identify all customers 40 years of age and all customers having an age of 50 years. To perform this action, a user could construct a query by selecting “Age” from an attribute drop-down menu, specifying a value of 40, selecting a Boolean logical operator “OR” from a drop-down menu, then selecting “Age” from an attribute drop-down menu, and then specifying a value of 50. The resulting query is:
Select all where:(Age = 40) OR (Age = 50)
However, as even this simple example illustrates, the use of Boolean expressions often results in confusion for non-technical users. In some cases, the language of a Boolean expression can appear to be similar to a “plain English” expression with a different meaning, and may cause a user to make erroneous assumptions. In the previous example, a user wishes to identify all customers 40 years of age as well as customers 50 years of age. In this situation, a non-technical user of a query interface will often mistakenly select the logical operator “AND” rather than “OR,” since he may state colloquially that he requires data for the customers aged 40 years and the customers aged 50 years. If this occurs, the resulting query is:
Select all where:(Age = 40) AND (Age = 50)However, this query will return a data set comprising the records of all customers who individually have both the age of 40 years and the age of 50 years. Obviously, most customer databases will only accommodate a single value for the “Age” attribute of a customer record. Therefore, since no single customer can meet the query conditions specified, this query does not return any query results, which may lead the user to conclude that there are simply no records in the database for a customers that are either 40 years of age or 50 years of age. Similarly, non-technical users will often mistakenly construct queries with logical operator “OR” in situations that properly require the use of logical operator “AND.”
Beyond problems in selecting the proper logical operators, non-technical users can also become confused in properly structuring multiple query conditions. When multiple query conditions are required, the logical operators and the query conditions must be ordered and grouped to establish the proper logical precedence to extract the desired set of data. Sometimes, implicit rules of operator precedence will coincide with the grouping desired by a user. However, frequently, they will not. Because of these complexities, it is common for non-technical users to incorrectly structure the logical operators and the query conditions in queries.
For the above reasons, query interfaces existing in the prior art can lead to error and confusion for non-technical users. Accordingly, there is a need for a method for automatically determining Boolean logic and operator precedence of query conditions.