Query construction can be thought of as a two-stage process having an assembly stage followed by a refinement stage, as discussed by Willis et al. “Users' Models of the Information Space: The Case for Two Search Models,” Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) '95, (Seattle Wash., 1995), ACM Press, pp. 205-210 and Plaisant et al., “Interface and Data Architecture for Query Preview in Networked Information Systems,” ACM Transactions on Information Systems, 17, 3 (July 1999), pp. 320-341. The assembly stage gathers together relevant information, and the refinement stage iteratively narrows and nests the user's focus until a desired set is achieved.
However, use of Boolean operators is difficult for novice users because, for example, common language usages of these terms can be the opposite of their usage in Boolean expressions, as discussed by Pane et al., “Tabular and Textual Methods for Selecting Objects from a Group,” Proceedings of VL 2000: IEEE Int. Symposium on Visual Languages (Seattle, Wash., September 2000), IEEE Press, pp. 157-164 and Young et al. As an example of the confusion with common language, assume that a user wants to extract all records associated with two cities, Boston and New York. Under these circumstances, if the user relies on common language parlance, the user may logically request all records from “Boston AND New York,” which produces a Boolean intersection of sets of records associated with these cities, and is likely to result in no records being returned, depending on the nature of the data in the database being searched. In contrast, the correct request would be a union operator using a Boolean search query of “Boston OR New York.” However, the naive user may incorrectly believe that the “OR” means that only one of these cities would be considered.
Another difficulty a user may have is that in text queries, parentheses are used to establish precedence among the operators, which may be poorly understood by novice users, as discussed by Michard, “A New Database Query Language for Non-Professional Users: Design Principles and Ergonomic Evaluation,” Behavioral and Information Technology, Vol. 13 (1982), pp. 279-288, and Young et al., cited above. Nested parentheses are a particular source of difficulty for many users, as also discussed by Michard. In particular, when there are many levels of nesting the user may get confused as to which two parentheses form the pair that delineate the level the user is interested in.
As discussed by Jones, “Dynamic Query Result Previews for a Digital Library,” Proceedings of the 3rd ACM Conference on Digial Libraries (Pittsburgh, Pa., May 1998), ACM Press, pp. 291-292, and “Graphical Query Specification and Dynamic Result Previews for a Digital Library,” Proceedings of the 11th Annual ACM Symposium on User Interface Software and Technology (San Francisco, Calif., November 1998), ACM Press, pp. 143-151, and Plaisant et al., “Interface and Data Architecture for Query Preview in Networked Information Systems,” ACM Transactions on Information Systems, Vol. 17, No. 3 (July 1999), pp. 320-341, a graphical representation of the query can result in substantial benefits during the refinement phase of query building. Potential advantages include fewer zero-hit queries, reduced network activity due to the prevention of retrieval of undesired records, improved query comprehension, and better support for browsing within an unfamiliar database.
Query preview is available in prior prototypes that support dynamic querying, using direct manipulation to construct queries, as disclosed by Ahlberg et al., “Dynamic Queries for Information Exploration: An Implementation and Evaluation,” Proceedings of ACM CHI '92, (1992), ACM Press, pp. 619-626, Malan et al., “Visual Query Tools for Uncertain Spatio-Temporal Data,” Proceedings of the 9th ACM International Conference on Multimedia (Ottawa, Canada, September 2001), ACM Press, pp. 522-524, and Plaisant et al., “Dynamaps: Dynamic Queries on a Health Statistics Atlas,” Proceedings of ACM CHI '94 (Boston, Mass., April 1994), ACM Press, pp. 439-440. However, query preview may add a significant amount of processing overhead.
Query formulations that include tabular layouts of cards are disclosed by Pane et al., “Improving User Performance on Boolean Queries” Proceedings of CHI 2000 Extended Abstracts (The Hague, Netherlands, April 2000). ACM Press, pp. 269-270, and “Tabular and Textual Methods for Selecting Objects from a Group,” Proceedings of VL 2000: IEEE Int. Symposium on Visual Languages (Seattle, Wash., September 2000), IEEE Press, pp. 157-164. Similarly, boxes that may overlap in horizontal or vertical dimensions are used by Anick et al., “A Direct Manipulation Interface for Boolean Information Retrieval via Natural Language Query,” Proceedings of ACM Special Interest Group on Information Retrieval (SIGIR) '90, ACM Press, pp. 135-150.
Young et al., “A Graphical Filter/Flow Representation of Boolean Queries: A Prototype Implementation and Evaluation,” Journal of the American Society of Information Science, Vol. 44, No. 6 (1993), pp. 327-339, introduced a Filter/Flow model for query building to overcome issues related to the limited Boolean knowledge of end users. Young et al.'s prototype relied upon a graphical representation of parallel branching and constrictions in data flowing to represent OR's and AND's in a Boolean query.
FIG. 1 shows prior filter flow interface 100 according to Young et al. Filter flow interface 100 has filters 102, 104, 106, 108, and 110, and has dataflow lines 122, 124, 126, 128, and 130. The width of each of dataflow lines 122, 124, 126, 128, and 130 represents the amount of data retrieved after application of those of filters 102, 104, 106, 108, and 110 that precede that dataflow line. Filters 102, 104, 106, 108, and 110 filter the collection of data by limiting the total amount of data in the results as depicted by the dataflow lines 122, 124, 126, 128, and 130. As the data is filtered, the dataflow lines get narrower. An analogy can be made to dams on a river constricting the flow of the river and filters 102, 104, 106, 108, and 110 (analogous to the dams) constricting dataflow lines 122, 124, 126, 128, and 130 (analogous to the rivers) by limiting the amount of data remaining in the search results (analogous to the amount of water remaining in the river).
In FIG. 1, filters located on dataflow lines arranged parallel to one another are logically ORed with one another forming a union of the two sets of data, while filters located on dataflow lines that are serially connected are logically ANDed with one another forming the intersection of the two sets of data. In the example depicted in FIG. 1, the search query represented will find any document having a location of Georgia (filter 102, filtering dataflow line 122 to produce dataflow line 124) including either clerks (filter 104, filtering dataflow line 124 to produce dataflow line 126) with an income of $40,000 or more (filter 106, filtering dataflow line 126 to contribute to dataflow line 128), or all accountants and engineers (filter 108 also filtering dataflow line 124, but producing dataflow line 130) having Elizabeth as their manager (filter 110 filtering dataflow line 130 to also contribute to dataflow line 128). Dataflow line 128 that include the output of filter 110 ORed with the output of filter 106 as indicated by filters 106 and 110 being located on parallel paths including parallel dataflow lines 126 and 130.
Using Young et al.'s Filter/Flow model, users comprehended and constructed queries more accurately, compared to users of all text-based queries. Users also strongly preferred the graphical interface over the text query interface. However, the Filter/Flow model by Young et al. is limited in its flexibility and was not connected to a database.