Information retrieval is one of the primary uses of computer systems. To retrieve certain information from a collection of data, an information retrieval system receives a specification of a selection query and then applies the selection query to the collection so that data that satisfies the selection query can be retrieved. For example, the collection of data may be a database table that contains records with various fields. A selection query would specify which records of the table are to be selected based on the value of fields in the records. For example, if the table contains a record for each employee in a company, the fields may include employee name, department, supervisor, and salary. A typical selection query may specify to select all records for employees whose supervisor is Smith. A user may specify such a selection query by entering : EQU Supervisor=Smith
An information retrieval system would select the records that satisfy this selection query and then retrieve the selected records.
Although the specifying of such a selection query is straightforward, it is much more difficult for a user to specify a more complex selection query. For example, if a user wants to retrieve all the records for employees who are in the marketing department but whose supervisor is not Smith and for employees who are in the accounting department, then a user would specify such a selection query by entering: EQU (Department=Marketing AND Supervisor.noteq.Smith) OR (Department=Accounting )
Such selection queries are generally specified using Boolean logic. However, unless a user has had formal training in Boolean logic, the user may not fully understand the meaning of the various logic relations. In particular, users often confuse the logical-AND and logical-OR Boolean relations. In the example above, a user who is confused may incorrectly specify the selection query by entering: EQU (Department=Marketing AND Supervisor.noteq.Smith) AND (Department=Accounting)
Although the second logical-AND would seem to be consistent with the statement "and for the employees who are in the accounting department," the use of such a logical-AND is incorrect. Also, users often do not fully understand the use of parentheses and find them cumbersome to use. In addition, the use of parentheses is a major source of errors in specifying selection queries by even those who understand Boolean logic. For example, a user may specify the selection query by entering: EQU Department=Accounting OR Supervisor.noteq.Smith AND Department=Marketing
If the logical-AND and a logical-OR have equal precedence, then this selection query would specify to select the employees who are in both the accounting and marketing departments and the employees whose supervisor is not Smith and are in the marketing department. In other words, only employees who are in the marketing department and also who are in the accounting department or also whose supervisor is not Smith are selected. This specification does not correctly reflect the user's intention to select all employees in the accounting department.
Several techniques have been developed to help users specify a selection query. Two such techniques are Query-By-Example (QBE) and Venn diagrams. FIGS. 1A and 1B illustrate the use of Query-By-Example. QBE presents columns for various fields of a table and allows a user to enter the selection query into the columns. For example, to specify a selection query for all employees whose supervisor is Smith, the user enters "Smith" into the appropriate column as shown in FIG. 1A. To specify a selection query for all employees who are in the marketing department but whose supervisor is not Smith and for all employees who are in the accounting department, the user enters ".noteq.Smith" and "Marketing" into the appropriate columns of the same row and enters "Accounting" into the appropriate column of another row as shown in FIG. 1B. The conditions (e.g., "Supervisor.noteq.Smith") in a single row are togically-AND'd and the conditions in different rows are logically-OR'd to form the selection query. FIGS. 2A and 2B illustrate the use of Venn diagrams. After a user has specified a selection query, a Venn diagram can be displayed to help the user understand how the information retrieval system is interpreting the selection query. FIG. 2A shows the Venn diagram corresponding to the selection query of FIG. 1A. The circle represents all employees and the shaded region indicate those employees specified by the selection query. FIG. 2B shows the Venn diagram corresponding to the selection query of FIG. 1B. Each circle represents all employees. The shaded region in circle 201 indicates those employees who are in the marketing department and whose supervisor is not Smith. The shaded region in circle 202 indicates the employees in the accounting department. If the Venn diagram indicates that the selection query does not specify the records that the user intends to retrieve, the user can re-specify the selection query.
Selection queries can be used to specify retrieval from a variety of collections of data. These collections can include tables in a database system. files in a file system documents in a document management system, and Web pages on the World Wide Web. The relations in the selection queries are typically adapted to the type of data in the collection. For example, if a database table contains numerical data, then numerical relations (e.g., ".gtoreq.") would be used. Also, the selection queries for documents may specify proximity relations (e.g., a certain word near another word or two words in the same sentence).
Information retrieval is especially fundamental to users of the World Wide Web (WWW). The WWW comprises thousands of computer whose information data can be retrieved by users of the WWW. Various WWW information retrieval systems are known as "search engines." These search engines typically require users to specify selection queries by entering conditions and Boolean relation. However, access to the WWW is increasingly becoming available to everyone. Since the vast majority of people do not fully understand Boolean logic, the specifying of the intended selection query has been problematic. It would be desirable to have a technique for specifying selection queries that would allow a typical user to correctly specify the intended selection query.