This invention generally relates to a process for database querying, and more particularly concerns an interactive interface for chart-based graphical data browsing, querying and manipulation.
A portion of the disclosure of this patent document pertains to material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in Patent and Trademark patent file or records once the patent issues or the file is otherwise made available to the public according to law, but otherwise reserves all copyright rights whatsoever.
Visual data evaluation is an important part of information processing. For many objectives such as pattern recognition, outlier analysis and general data exploration, human visual data evaluations are far superior to other options such as automated statistical procedures.
Visual data evaluation embodies two related activities: browsing and querying. Browsing is an information-seeking process which utilizes chart displays of data including bar charts, scatter plots and the like to detect patterns, to develop an intuitive understanding of relationships exhibited by the data and to determine the next desired chart presentation of the data. "Querying," which is a mechanism for accessing information stored in a database, specifies and selects data subsets which can then be used for additional browsing, querying or other information processing. Browsing can be considered the primary visual information gathering step while querying allows the user to "navigate" through the data.
Visual data evaluation, however, is necessarily constrained by the limits of human cognitive abilities which permit only relatively small quantities of data to be evaluated at one time. For example, a data series with one hundred observations can easily be inspected visually while one with ten thousand observations cannot. Consequently, large data series must be divided into subsets to conduct visual data evaluations. Furthermore, graphical representations quickly become confusing when a third or fourth dimension is added. Therefore, most visual data evaluation is conducted on pairs of data items or variables. Finally, the limits of human memory make it difficult to remember and process important information from more than a dozen or so data subsets or pair-wise visual evaluations.
Commercially available software can provide data manipulation, querying and charting functions required to conduct visual data evaluation. While the general class of software referred to as spreadsheet software is most suited and most frequently used to conduct these evaluations, its application is cumbersome and tedious.
Spreadsheet software represents data in a column and row tabular format. Spreadsheet data is selected for use in a chart by specifying desired columns and rows either through a series of keyboard entries or with mouse operations on the tabular data display. Chart specifications are determined with a series of choices presented in menus, dialogue boxes, or other user-interactive devices. Finally, the chart is presented in a separate display area. Changing one of the variable series presented in the chart is initiated by returning to the spreadsheet display and then repeating the data selection process and updating the chart display.
Examining a noncontiguous subset of spreadsheet data requires a separate data query to extract the desired data and place it in another area of the worksheet. Queries are performed after users specify the data to be queried, query parameters and the area in the spreadsheet that will hold the query results. The columns and rows containing the new data subset are then referenced and incorporated in new chart displays. Spreadsheet data manipulation is also often applied to compute statistics for query results at intermediate points in the evaluation. Additional charting, querying and data manipulation continue through as many additional repetitions as desired.
While the results of spreadsheet-based visual data evaluations are recognized to have substantial value, only limited evaluations can be undertaken before the detailed, cumbersome, and error-prone nature of identifying and developing new charts and keeping track of the relationships between the new queries and charts and the query and chart histories overtakes the ability of the user to interpret the information.
Commercially available database management software (DBMS) also represents data in a tabular format; however, instead of the row and column terminology, database data is generally referenced as records and fields. DBMS typically provides greater flexibility in query operations including the computation of a limited number of statistics. Although queries of great complexity may be formulated using DBMS, this software is much more difficult to use than is spreadsheet software. The standard Structured Query Language (SQL) is entered in English-like commands; however, complicated and rigorously structured syntax makes SQL querying a difficult process for users to master. Table or form-based procedures such as Query-by-Example (QBE) have been developed to convert form and table input into query results, and while these developments have decreased the burden of SQL query formulation somewhat, they certainly have not eliminated it.
More recent DBMS and spreadsheet software systems apply graphical user interface (GUI) techniques which allow users to select icons, symbols or other representations presented on the display to specify query details. While software in this later category is often said to provide "graphical" queries (applying the word "graphical" as it is used in the term GUI), a more appropriate characterization is "symbolic query". That is, the user selects symbols from the display to specify a query. Symbols include icons, tables, text, pictures and other representations which connotate data or query operations.
While symbolic systems provide a more intuitive query process than SQL thereby reducing the number of errors inexperienced SQL users might otherwise make, they still require the user to conceptually comprehend complicated database or spreadsheet processes and structures in order to master a series of detailed procedures to define query criteria and relationships.
One of the most important DBMS operations is the "join" operation which is applied to relational databases to combine data from two separate database tables into a single table for further processing. Users must specify commands in SQL text or use mouse selections in symbolic systems to select the tables to be joined and to identify the variables to be included in the resulting table. The problem with this process is that each time the user wants to combine a new set of variables from different tables or wants to include another variable in the current table, a new join operation must be completed. While the relational database approach of storing data in separate tables provides an efficient and flexible model for maintaining the physical database system, it also adds the additional burdens on the user who possess an understanding of the table structure of the database and must perform table joins each time variables are used from different tables.
DBMS also requires the same kind of detailed, sequential step-by-step data manipulation, query and charting process described for spreadsheet software. As a result, DBMS is even more difficult to use in its application to visual data evaluation. Consequently, the currently available spreadsheet and DBMS software, except in the simplest cases, is woefully inadequate in its application to visual data evaluation.
One problem that limits the value of the currently available spreadsheet and DBMS software systems in visual data evaluation is that the large number of detailed steps required in data manipulation, querying and charting severely limit the user's ability to remember and process database information.
Another problem with such software systems is that the charting process, which is the primary visual mode for presenting and discovering information, is provided only as a final visual documentation of data selection and manipulation which have already occurred rather than as an integrated part of the visual data evaluation process itself. Consequently, users do not receive interactive, visual feedback from manipulation of the database that occurs with queries and browsing activities.
Yet another problem with existing DBMS and, to a lesser degree, spreadsheet software is the complex, highly structured and non-intuitive nature of database and spreadsheet query processes and the resulting difficulty in mastering such systems.
Yet another problem with existing software systems is that separate "join" operations must be applied to data from different tables prior to accessing the data in a single combined table.
Still another problem is that the visual data evaluations quickly become confusing as the user loses an understanding of the evolution of previous graphical, statistical and query steps.
The problems with currently available software are becoming even more severe as increased use of databases and greater need for information is occurring at the same time that rapidly increasing communications networks and workstations are making these data available to users who are generally less proficient in the tools needed to efficiently and accurately access database information.
For the foregoing reasons, there is a need for a new, simpler software system for visually evaluating information in complex databases. This new system should support rather than tax human cognitive abilities throughout the data evaluation process. The system should also provide a new intuitive visual context for browsing and querying data by means other than lengthy, detailed procedures employing formal language statements or symbolic query representations. Rather, users should be able to directly view, evaluate and query data with simple operations on a single comprehensive dynamic representation of the database. Such new method should also be able to accomplish complex queries while at the same time be easily understandable to novice database users. Further, the method should also allow users to select all database variables without having to conduct separate join operations prior to accessing the data. Finally, the system must be directly applicable to a broad base of existing database systems and network environments.