The present invention concerns data visualization wherein structure has been imposed on data and a means of displaying the structure is needed. Traditional methods for displaying structures (such as hierarchies) are difficult for people to use when the structures get large.
Data reduction schemes such as those used in the mining of data from a large database impose a structure onto the data to better understand that data. Often a tree (or hierarchical) representation of the data is provided. A tree representation can often ease the viewing, accessing or understanding of the data represented by the tree. Tree structures are particularly convenient for separating large databases into segments or subsets of data. A set of files in a computer system are usually also represented as a hierarchy of directories with the leaves being individual files. This is also true for books in a library catalog system, and so forth. In general a tree has one top level or xe2x80x9crootxe2x80x9d node which can have two or more branches emanating from it. The branches represent some logical separation of the data. Each of these branches ends in another node, which can in turn have branches leaving from it, or the node can be a termination point or xe2x80x9cleafxe2x80x9d of the tree (no more branches). Examples of data structured as a tree include the directory structure of a computer file system, a database table representing a xe2x80x9cbill-of-materialsxe2x80x9d relationship, and the organization chart of a corporation. Examples from data mining include decision trees for classification and hierarchies of clusters (segments) generated from a hierarchical agglomerative clustering algorithm or a similar method.
Other examples of structures that are used to impose order to a large data set are networks or graphs. These structures do not have a single root but do have nodes that are interconnected by edges. Local area and wide area networks are examples of structures containing data which can conveniently depicted as a graph of nodes indicating for example nodes on a network. Such a graph could be used to indicate traffic on the network wherein data passing though a transmission node would be represented as data within a node of the graph.
In data mining, especially in building decision trees for prediction over a database, it is frequently the case that a very large tree is produced. An example of a decision tree for use in data mining is disclosed in copending U.S. patent application Ser. No. 08/982,760 entitled xe2x80x9cMethod and Apparatus for Efficient Mining Classification Models from Databasesxe2x80x9d to Chaudhuri et al which is assigned to the assignee of the present invention. Viewing an entire tree or browsing the data using the model extracted (the tree) is very challenging when the tree is large (has many nodes). Most prior art systems for displaying data structures such as trees display the tree and then zoom in and out to show either smaller or larger portions of the tree. These prior systems make it difficult to browse the tree structure in detail, while continuing to provide the user a context of what portion of the tree structure is being viewed.
Because of the hierarchical branching inherent in a typical tree structured data set, the xe2x80x9cwidthxe2x80x9d of the tree tends to increase exponentially with the xe2x80x9cdepthxe2x80x9d. For example, a balanced tree that has on average 4 branches per node will have 4 nodes at the 2nd level of the hierarchy, 16 nodes at the 3rd, 256 nodes at the 5, and 4n at the nth level.
Traditional methods for displaying a tree in a user interface use an equally sized object for each node in the tree. The tree can be laid out graphically as a network of connected objects in a window with scroll bars. Another example of a prior art tree representation is the hierarchy of files and directories displayed by the Microsoft Windows Explorer program. In both of these examples provision must be made to collapse or expand a node in order to make the navigation of a large unwieldy tree manageable.
For certain situations, such a tree may need to be seen in its entirety (fully expanded). If the tree is scaled down so that it can be viewed completely at one time, then not much useful information can be shown along with the nodes of the tree. If the entire tree is laid out so that usable information can be shown on each node, then certain problems arise. When the top of the tree is viewed, the distance between the high level nodes can become so great that they are of no use. When the bottom of the tree is put into view, the lower level nodes become a tangle of seemingly disconnected information (it is difficult to see the relationships between the nodes because connections to the parent nodes cannot be seen).
FIGS. 1 and 2 illustrate these problems. FIG. 1 depicts different visual views of a large amount of data in the form of a tree. One view depicts data nodes near a top (left side in FIG. 1) and a second view depicts a different set of data nodes near a base or bottom (right side of FIG. 1) of the data tree. A scroll bars are used to navigate the tree structure that is displayed in the scrollable window. FIG. 2 is an example of a fully expanded directory structure depicted by Windows Explorer where all connections to the higher level, owner directories have been lost.
The present invention concerns a method for enabling effective browsing and examination of large amounts of data that are organized or classified in a data structure. Many of the problems that have been experienced trying to explore and/or view large amounts of data are overcome by a novel navigation and rendering scheme constructed in accordance with the invention.
Two simultaneously viewable windows are displayed for a user. Using the example of a data tree, an overview of the entire tree is depicted in one window and only a portion of the tree is displayed in a second window. The second window shows individual nodes and interconnections and the first overview window depicts the entire tree in a way that makes traversal of the information in the tree intuitive to the user.
One use of the invention is for viewing a decision tree produced by a data mining system such as the data mining system disclosed in co-pending U.S. patent application Ser. No. 08/982,760 entitled xe2x80x9cMethod and Apparatus for Efficiently Mining Classification Models from Databases.xe2x80x9d Another representative use of the invention displays and navigates a file structure maintained by a computer operating system. Generally, the invention has application for displaying and system of interconnected nodes such as a graph, a network, an organizational chart, a flowchart etc. wherein data or information is associated with nodes of the system.
Use of color gradients helps the user identify trends or anomalies in the data by visualizing the tree as a whole. In general, a property is associated with a clor, and the color intensity can visually represent the value of the property (e.g. red being high, yellow being low). An exemplary embodiment of the invention is implemented as an ActiveX control with a user interface suitable for viewing and exploring large trees.
One exemplary embodiment of the invention includes a method for displaying data as a tree data structure. A user interface is painted by a tree rendering component that allows intuitive navigation and interpretation of the tree structure. The tree rendering component updates two related windows, a layout window and a thumbnail window. The tree rendering component maintains a structure of a tree depicted in the thumbnail window and depicts a portion of the entire tree in the layout window. The use of side by side windows, one of which shows the whole tree and another of which shows a portion of the tree allows easier user visualization of the data characterized by the tree.
The exemplary embodiment of the invention also conveys additional information in other windows (or window panes of the main window) on a viewing monitor. In accordance with one embodiment, a path window displays as text a sequence of concatenated decision steps required to reach a given node in the data structure. It is a textual summary of the context. Additionally, the user can select a given node that is displayed in the layout window and a detail window itemizes information about the contents of the selected node. The detail window can include a histogram of the values of a variable or score of interest. In use of the invention with a database classifier such a window could further itemize different categories of data that satisfy the logic leading to a particular node of the data structure.
These and other objects, advantages and features of the invention are further understood from the detailed description of an exemplary embodiment of the invention which is described in conjunction with the accompanying drawings.