This invention relates generally to high-dimensional data, and more particularly to the visualization of such data.
With the advent of the Internet, and especially electronic commerce (xe2x80x9cecommercexe2x80x9d) over the Internet, the use of data analysis tools, has increased. In ecommerce and other Internet and non-Internet applications, databases are generated and maintained that have large amounts of information, so that they can be analyzed, or xe2x80x9cmined,xe2x80x9d to learn additional information regarding customers, users, products, etc. That is, data analysis tools provide for leveraging the data already contained in databases to learn new insights regarding the data by uncovering patterns, relationships, or correlations.
It is usually desirable for a data analyst to visualize the relationships and patterns underlying the data. Existing exploratory data analysis techniques include plotting data for subsets of variables, and various clustering methods. However, inasmuch as the data analyst desires to have as many tools at his or her disposal as possible, new visualization techniques for displaying the relationships and patterns underlying data are always welcome. For this and other reasons, therefore, there is a motivation for the present invention.
The invention relates to the visualization of high-dimensional data sets. In one embodiment, a network is constructed for a data set having a number of variables, which can also be referred to as dimensions or columns. The network, such as a dependency or a Bayesian network, has a number of nodes that have dependencies thereamong. Each node corresponds to a variable in the data set, and has a local distribution. Each dependency has a first node and a second node, such that the first node depends on the second node.
In one embodiment, the network is displayed as a number of items and a number of connections. Each item represents a node of the network. Each connection, such as an arc, represents a dependency and connects a first item representing the first node of the dependency with a second item representing the second node of the dependency. In one embodiment, selection of a particular item displayed that represents a particular node results in the display of the local distribution associated with the particular node.
In another embodiment, only a predetermined number of the items are shown, such as only the items representing the most popular nodes of the data set. Furthermore, in one embodiment, in response to receiving a user input, such as in conjunction with a graphical slider, a sub-set of the connections is displayed, proportional to the user input, in accordance with a predetermined measure of the dependencies represented by the connections. Thus, from all of the connections to only a connection representing the dependency having a largest value for the predetermined measure can be displayed.
In another embodiment, a particular item is displayed in an emphasized manner, and the particular connections representing dependencies including the node represented by the particular item, as well as the items representing nodes also in these dependencies, are also displayed in the emphasized manner. The emphasized manner can be, for example, only displaying the particular item, the particular connections, and the items representing nodes also in the dependencies represented by the particular connections, and not showing any of the other items or connections. Furthermore, in one embodiment, only an indicated sub-set of the items is displayed, as well as the connections representing dependencies among the nodes represented by the indicated sub-set of items. For example, the user may be able to add items to the indicated sub-set by searching for desired items, or otherwise selecting items, in an item-by-item manner.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.