1. Field of the Invention
The present invention generally relates to data processing and more particularly to accessing data in a data repository, such as a database.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
A query includes one or more criterion for data selection. For example, the criterion may specify a data range, test for an existence condition, test for equivalence, or perform a similar table comparison operation. The criterion may be logically related by Boolean operators (e.g., AND, OR). Consider the following query involving two criterions (i.e. the comparisons in the WHERE clause) which are ANDed together.
QUERY 10001SELECT T1.Col10002FROM T1,T20003WHERE T1.Col1 = T2.Col1 AND T1.Col2 = ’A’AND T2.Col2 = ’B’Query 1 will return a value from Col1 in T1 for every pair of records in T1 and T2 where Col1 values match, and the value of T1.Col2 is ‘A’, and the value of T2.Col2 is ‘B’.
One of the challenges encountered when querying information from a large database or data warehouse is to understand how changes in query logic affect the data returned from a given query. In particular, when the query is complex, involving a large number of criteria associated with logic operators (e.g., AND and OR), it can be difficult to understand from the query which criterion has the greatest impact on the number of items returned from a query. The impact each criterion has on an overall query may be either positive or negative. A criterion that has a negative impact returns undesired results while a criterion that has a positive impact returns desired results. For example, a user may desire a particular sample size, in which case a criterion that excludes a large number of items from overall query results has a negative impact. Alternatively, a user may be looking for particular records, in which case a criterion that excludes desired records from overall query results has a negative impact. Knowing the impact of each criterion can be important to the user, especially when the initial query returns either too little information or too much information. In such situations, the user running the query would like to easily understand how to modify the query to meet their data requirements. Unfortunately, conventional database environments provide the user with no choice but to iteratively re-craft and re-execute queries until arriving at an acceptable results set. This “trial and error process is both time consuming and inefficient.
Therefore, there is a need for improved query analysis.