1. Technical Field
One or more embodiments described herein relate generally to providing event sequence data analyses. More specifically, one or more embodiments relate to providing visualizations of event sequence data analyses.
2. Background and Relevant Art
Event sequence data analysis is common in many domains, including web and software development, transportation, and medical care. For example, websites log how users navigate their pages, airlines track events during airplane flights, and hospitals record when patients transfer from one part of the hospital to another. Event sequence data analysis allows for an understanding of trends, sources of problems, and other information about the event sequence data. To aid in comprehension of event sequence data analysis, visualization techniques are often used to convey information about the event sequence data. Unfortunately, conventional event sequence data analysis visualization techniques have various drawbacks. Several of these drawbacks are described below in reference to website event sequence data.
Modern websites typically include multiple webpages that a user transitions through via hyperlinks connecting one webpage to another. For example, a website generally has a home page including multiple hyperlinks that direct a user to other webpages within the website. Accordingly, a user can transition from the home page to another webpage within the website by clicking on the provided hyperlinks. In this way, a website user can search for a particular product, review product pages, purchase a product, and so forth.
Occasionally, a user will transition through a website and then leave the website without making a purchase. Website managers generally refer to this as “user fallout.” Typically, the goal of a commercial website manager is to minimize user fallout. In order to minimize user fallout, a website manager (i.e., a web master) analyzes website event sequence logs in order to identify where website users lose interest and leave the website. The event sequence logs can include web traffic information indicating how users of a website transition through and eventually leave the website. The process of analyzing web traffic information is generally referred to as “clickstream analysis.” A problem arises, however, in that the event sequence logs include a great deal of information that is generally not in a format that is easily understood by web managers for purposes of clickstream analysis.
For this reason, website managers typically utilize various tools to assist them in analyzing and visualizing event sequence log information. These tools generally provide visualizations that illustrate how users of the website transition from one webpage to another webpage, and so on. One such tool is the Sankey diagram. The Sankey diagram visualizes event sequence data by providing two or more columns, wherein each column includes a listing of webpages. The Sankey diagram further includes edges connecting webpages between the columns. For example, to illustrate one level of web traffic related to a website consisting of three webpages (i.e., webpage A, webpage B, and webpage C), the Sankey diagram can include two columns, each column including webpages A, B, and C. Then, for each user who transitioned from one webpage to another (e.g., the user started at webpage A, then clicked a hyperlink to transition to webpage B), the Sankey diagram would include an edge connecting webpage A in the first column to webpage B in the second column. The thickness of the edges connecting webpages between columns in a Sankey diagram corresponds to the volume of user traffic between the webpages (i.e., the edge connecting the home page to a popular product page may be very thick, while the edge connecting the home page to a less popular product page may be less thick).
As a website becomes more complex (i.e., more webpages and hyperlinks) and becomes more heavily trafficked, existing analysis and visualization tools typically fail to provide accurate representations of sequence data. For example, a Sankey diagram of a heavily trafficked website is generally very difficult to read accurately in light of the multitude of intersecting edges of various thicknesses between the columns. This problem is typically common to existing analysis and visualization tools. Thus, performing a clickstream analysis while utilizing a Sankey diagram and other similar tools becomes cumbersome and difficult.
Furthermore, existing analysis and visualization tools generally do not provide accurate representations of a comparison of two sets of event sequence data. For example, a website manager may overhaul the design and layout of a website. In order to determine the effect that the website overhaul has on user traffic, the website manager may desire to compare a data set including web traffic data from before the overhaul to a data set including web traffic data from after the overhaul. As was mentioned above, tools such as the Sankey diagram provide comparison visualizations that are generally crowded and difficult to understand, particularly when comparing multiple data sets.
Thus, current methods of providing visualizations for event sequence data include several disadvantages that lead to ineffective analyses.