Electronic data management products provide marketing entities a centralized system for gathering and analyzing diverse types of electronic data on Internet analytics collected on observed users. The electronic data can aid the marketing entity in providing personalized marketing communications for individual users. For example, a marketing entity (which can include entities such as advertisers, marketers, or other agencies) frequently receives information on network actions taken by one or more observed users. The electronic data that is received by the marketing entities encompasses diverse types of marketing data and other data. For example, marketing entities collect data on visits and interactions by observed users on a website, digital advertising data, website personalization data, geographic and demographic data of observed users, among other electronic data. Marketing entities further receive electronic data via multiple sources, such as mobile and desktop computing devices, smart devices connected to the Internet, and other sources. Receiving diverse types of data across millions of observed users can greatly enhance the distribution and targeting of electronic communication by more precisely defining the interests of the observed users and thus provide a better user experience for the end users. However, the electronic nature of the data collection creates hurdles unique to the network environment in which they are used. Specifically, with this growth of big data, there is an increased need to determine relationships between sets of collected data and present the relevancy of the tracked data. Because the size and structure of the data is large and complex, determining and presenting the relationships and relevancy of the data presents unique challenges. Further compounding the problems inherent in large and complex data sets is that the data is increasingly being stored across multiple storage units in a distributed computing environment.
Consider the example of web sites. Many web sites are large and complex in nature and provide multiple functionalities, such as allowing users to find information, engage in commerce, socialize, or other functionalities. Such web sites have thousands or even more of unique web pages and are visited every day by millions or even more of users dispersed through geographic locations worldwide with various demographics. A marketing entity needs to receive information on traffic patterns associated with the visits of the users to each web site or group of websites. For example, analytics data collected from user interactions with groups of websites include numeric data points having an unlimited number of different potential values and categorical (e.g., non-numeric) data points having a discrete set of limited values. Examples of numeric data points include duration of time of a web site visit, revenue generated per visit, age of observed user, etc. Examples of categorical/non-numeric data points include geographical location of the observed user, occupation of observed user, and other information that is not otherwise ordered. Prior efforts to determine the relationships between individual data fields and analyze the global structure of the data of various types has been limited. The various data variables on the observed users are stored across multiple storage units and processed by multiple distributed processing units. There is a need for determining and presenting the relationships and relevancy of large sets of data variables.