1. Field of the Invention
The present invention relates to the field of data entry and retrieval and, more particularly, to a method and system for annotating query components, such as query conditions, in an effort to share domain knowledge and facilitate building queries that retrieve desired data.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Generally speaking, queries take the form of a command language that let programmers and programs perform variety of operations on data, such as select, insert, update, find out the location of data, and so forth.
One problem facing programmers (or more generally any user building a query) is that databases tend to grow relatively brittle (inflexible) over time, which may increase the difficulties in crafting queries that retrieve a complete set of desired results. In other words, as business enterprise insert their own data, change data structures or formats, add features, and attempt to retire applications that once used the data or support older “legacy” applications, data may exist in the system in more than one format. For example, names may be entered in all capital letters sometimes (but not always), local area codes may be specified in some cases (but NULL in others), and employee IDs may be displayed in one format and stored in another format.
As a result, conventional queries rigidly adhering to a single data format may not return all the data that was desired. As an example, the following query condition:where last_name=‘Smith’returns no matches if all last names are capitalized. Similarly, the following query condition:where demographic.area_code=‘507’returns only partial results if some records have area codes included in telephone numbers, while others do not. Unfortunately, it may not be as apparent to a user that the query has returned only partial results as it would be if no results were returned at all. In other words, it may be very difficult to even recognize this type of problem. Finally, the following query:where employee_id=‘18-203-3243’will return no results if the common format for employee IDs are stored internally without hyphens (e.g., as 182033243). Thus, users that do not realize this problem may be working with incorrect results.
On the other hand, users that do become aware of this problem may learn to craft queries designed to retrieve the data in whatever format (or combination of formats) it may exist. For example, an individual user may craft queries logically OR'ing query conditions targeting the same data, but in different formats, such as:where last_name=‘Smith’ OR last_name=‘SMITH’in an attempt to retrieve all desired data, regardless of the format. However, while this may work for the individual user, if the underlying problem is not reported and/or the potential solution shared, other users may continue to build queries that return incorrect results.
Accordingly, there is a need for a method and system for sharing knowledge regarding query construction, with the possibility of error resolution, for example via automated query modifications.