This application relates to monitoring and mitigating information leaks that can occur through data mining by third party observers.
A major concern for many organizations is the leakage of information through employee use of the web. The leakage of information can occur through inadvertent actions of the employee as well as direct exfiltration of data. Through the use of search tools, web site monitoring, and other common commercial data analytical tools, a third party can derive substantial insights into the operation and planning of a large corporation.
Many Internet applications today use Data Analytical Services (DAS) to amass information about their users. Typically an application contracts with a DAS provider, so that the application provides raw data to the DAS provider and the DAS provider returns analytics to the application. The DAS use a number of methods to collected information on users' visit behavior. The types of information tracked include such factors as geolocation, dwell time on a particular web page, incoming and outgoing clicks (e.g., launch points), the type of computer used, the telecommunications provider used, as well as a number of other parameters—including tens to hundreds of data elements overall. This information is used to track and identify users, to make inferences about their preferences and habits, and to create the associations about their behaviors that are of value to commercial organizations. For example, a commercial website might use information about how long a user dwells on a set of product pages along with information on the user's geolocation to infer that a user has an interest in a particular class of product and belongs to a particular income class. The website can then use this information to improve its marketing to the user, not just to determine what class of products might interest a user, but also to present more exclusive product offerings within that class to more affluent users or more bargain-priced offerings to less affluent users. An example of this understanding is Amazon's ability to predict “what others like you” are interested in and Netflix's ability to correctly recommend movies. In both cases, the data mining systems develop complete models of needs, desires, and predictions of intent of the users. AMAZON is a registered trademark owned by Amazon, Inc. NETFLIX is a registered trademark owned by Netflix, Inc.
While the collected information has commercial value to DAS providers and applications, it can present a threat to individuals and organizations by revealing information that these entities do not wish to reveal. For users, this can mean that they may be revealing sensitive information about themselves, such as their identity or whereabouts, even when providing apparently innocuous information. This risk is particularly acute given that DAS providers aggregate information across multiple applications or web sites.
For organizations, there is the additional risk that the collective behavior of members of the organization (e.g., a group of users all visiting sites related to the same topic) could reveal sensitive information about the organization, such as product plans or future large scale business transactions. For example, when an organization is involved in particular subject matter or is investigating the subject matter for possible involvement, the web usage (e.g., searching and web browsing history) of the organization tends to exhibit an increased concentration around the subject matter in which the organization is involved in comparison to an uninterested or neutral organization. For example, if company A is secretly investigating company B for a possible acquisition, company A's web usage will likely tend to involve company B more than would otherwise be expected. A third party observer who is tracking the users in company A, such as a DAS, will likely have enough information about company A's web usage to discern company A's increased interest in Company B. If the observer knew that company A were, for example, an investment bank, the observer might be able to translate knowledge of company A's increased interest into its true intent regarding company B. The third party observer could then use the knowledge of the company A's intent for nefarious actions, such as, publicizing information about company A's secret investigations into company B to affect their stock prices for unjust profit or by placing certain investments in Company A or B that take advantage of the information.
In order to help users mitigate these risks, some tools have been developed to provide information about tracking and information being gathered about individual users, which function as tools on standalone computers. Other tools operate as a combination of a probe machine and user machine. A key element of these systems is that they are implemented locally on a user's device and do not make use of any network resources. While these systems can be very effective for sophisticated end users who run these applications on their PCs, they have several limitations. For example, (1) they require the installation of software on individual PCs, with requires either active configuration by the end users or the inclusion of the software in a corporate configuration management system with associated support resources; (2) they do not provide any form of information consolidation or analytics that are needed for assessing the risk to a given organization; (3) they do not provide any mechanism to assess what information is being gathered about the organization as a whole; and (4) they do not provide a means of discovering relationships and preferences that is language independent.
Other tools have also been developed to obscure the network layer connection path when accessing websites, such as Anonymizer. However, tools like Anonymizer, which only disassociate IP addresses from particular users, are unable to prevent a third party from receiving higher layer information (e.g., application or presentation layers). As such, the third party can still analyze users' behavior and then form user groups, realize intent, or infer the user's and/or organization's identity based on the analysis.