1. Field of the Invention
The present invention relates to the protection of the privacy of individuals communicating through computing or telecommunication devices/systems, or interacting software applications. More specifically, the present invention relates to a system and method for protecting user privacy using social inference protection techniques.
2. Related Art
Social computing relates to any type of computing application in which software serves as an intermediary of social relations. Examples of social computing applications include email, instant messaging, social networking web sites, and photo sharing web sites. Mobile social computing relates to social applications that run on mobile devices. A wide variety of mobile social computing applications exist, many of which leverage location and mobility to provide innovative services. Examples of such applications include the Ulocate system, which allow for user real-time tracking of users and provides a list people within a social network, as well as their locations; the LoveGety system, which provides proximity match alerts when a male LoveGety user and a female LoveGety user are within 15 feet of one another; the ActiveCampus system, which provides maps showing the location of users on campus; the Social Net system, which provides social match alerts inferred from collocation histories; various matching systems which recommend people to people based on similar interests, activities, personalities, etc.; and Twitter, which enables microblogging/citizen journalism.
Many social networking sites such as Facebook and Orkut use user profiles and existing friendships to enable social communications or recommend possible matches. However, using and sharing geotemporal and personal information raises many serious privacy concerns. Examples of categories of potential privacy invasions in mobile social computing systems are: inappropriate use by administrators (for example, a system administrator may sell personal data without permission); legal obligations (for example, a system administrator may be forced by an organization such as the police to reveal personal data); inadequate security; lack of control over direct revelations (for example, a cell phone application that reveals one location to a person's friends, but does this without properly informing the person or giving the person control of this feature); instantaneous social inference through lack of entropy: (for example, when one cell phone shows that Bob is nearby, and only two people with a similar cell phone are visible—one of them must be Bob, thus increasing the chance of identifying him; the example of the student and professor mentioned in the introduction also illustrates this category); historical social inferences through persistent user observation (for example, two nicknames are repeatedly shown on the first floor of the gym where the gym assistant normally sits—one of them must be the gym assistant); and social leveraging of privileged data (for example, David can't access a location, but Jane can—David asks Jane for the location).
The problem of social inferences which include instantaneous social inferences and historical social inferences is of particular concern in social computing. Inference is the process of concluding unrevealed information as a consequence of being presented with authorized information. A well-known example of the inference problem relates to an organization's database of employees, where the relation <Name, Salary> is a secret, but user u requests the following two queries: “List the RANK and SALARY of all employees” and “List the NAME and RANK of all employees.” None of the queries violates the security requirement because they do not contain the top-secret <NAME; SALARY> pair. But clearly, the user can infer the salaries of the employees using their ranks. Although the inference problem as a threat to database confidentiality is discussed in many studies, mobile social computing raises new classes of more complicated inferences, which we call social inferences. Social inferences are inferences about user information such as identity, location, activities, social relations, and profile information.
The social inference problem can include a wide range of issues. However, any inference that results from using social applications can be made in one of the following two ways:                1) the inferrer uses only the current state of the system, which is based only on the current observation of the system (referred to as “instantaneous inference”); or        2) the inferrer uses the history of her/his observations, or the history of the answers to previous queries (referred to as historical inference).        
Based on the nature of mobile social applications, social inferences are either the result of accessing location-based information or the result of social communications, or both. The first type is referred to herein as “location-related inferences” and the second type is referred to herein as “inferences in online communications.” The following examples are illustrative:                1. Instantaneous social inferences in online communications: Cathy chooses a nick name for her profile and hides her real name, but her profile shows that she is a female football player. Since there are only a few female football players at a given school, there is a high chance she can be identified.        2. Instantaneous location-related social inferences: a cell phone shows few nicknames in a room, and it is known that the room is Professor Smith's office. Therefore, Professor Smith is in his office and one of those few nicknames belongs to him.        3. Historical location-related inferences: Superman2 and Professor Johnson are repeatedly shown in a room, which is known as Professor Johnson's office. It is also known that David is his Ph.D. student. Therefore, Superman2 must be David and he is currently at Professor Johnson's office.        
Instantaneous and historical inferences must be predicted differently. However, previous inference prevention methods have not adequately addressed social inferences. This is due, in large part to the facts that:                1. The sensitivity of user information is dynamic in nature based on the context, such as time and location;        2. Information available to users is not limited to answers obtained from their queries, but includes users' background knowledge (the information users learn outside the database), which often is a premise in many inferences;        3. Information such as life patterns, physical characteristics, and the quality of social relations that are not kept in the database can be inferred from information available to the user (therefore, inferences in such systems are not limited to database attribute disclosures); and        4. Most social inferences are partial inferences not absolute inferences, i.e. they don't logically result from the premises as in the name-rank-salary example, but they can be guessed as a result of low information entropy.        
Extensive research and industry efforts have focused on helping computer users protect their privacy. Researchers have looked at various aspects of privacy enhancement such as ethics of information management, system features, access control systems, security and database confidentiality protection. These efforts can be classified into four sections, as discussed below: (1) ethics, principles and rules; (2) direct access control systems; (3) security protection; and (4) inference control solutions.
(1) Ethics, Principles, and Rules
In order to properly respond to concerns of ethics, principles and rules, and to protect the user privacy, researchers have made various suggestions. In particular, they have mentioned the following provisions for privacy sensitive systems:                Provide users with simple and appropriate control and feedback especially on the ways others can interact with them or access their information;        Provide appropriate user confirmation feedback mechanisms;        Maintain comfortable personal spaces to protect personal data from surreptitious capture;        Provide a decentralized architecture;        Provide the possibility of intentional ambiguity and plausible deniability;        Assure limited retention of data or disclose the data retention policy;        Facilitate the users with enough knowledge of privacy policies; and        Give users access to their own stored information.        
However, the foregoing provisions do not ensure that data will not be used in any undesired way, or that unnecessary data will not be collected. Therefore, one effort defines the principles of fair information practices as openness and transparency, individual participation, collection limitation, data quality, use limitation, reasonable security, accountability, and explicit consent. Then, principles for privacy in mobile computing are set, which consist of notice, choice, proximity, anonymity, security, and access. The aforementioned concerns and suggested requirements all relate to the aforementioned categories of inappropriate use, legal obligations, inadequate security, and poor features.
(2) Direct Access Control Systems
Access control systems provide the user with an interface and directly control people's access to the user or his/her information based on his/her privacy settings. Access control systems with an interface to protect user privacy started with internetworking. Later, they were extended to context-aware and then ubiquitous computing. The earliest work with in this area is P3P. P3P enables users to regulate their settings based on different factors including consequence, data-type, retention, purpose, and recipient. Another access control system, critic-based agents for online interactions, watch the user's actions and make appropriate privacy suggestions. Access control mechanisms for mobile and location-aware computing were introduced later.
In mobile systems, the context is also used as a factor in decision making. Thus, in addition to the factors defined in P3P, such as the recipient, the following aspects of context have been considered:                Location of the data owner;        Location of the data recipient;        Observational accuracy of data/granularity;        Persistence of data; and        Time.        
One system, Confab for mobile computing environments, enables users to set what information is accessible by others on their contact list based on the time of information collection. Similar systems in mobile environments adds the time of information collection to the factors of recipient and data-type. Also, a privacy awareness system targeted at mobile computing environments has bee implemented, and is designed to create a sense of accountability for users. It allows data collectors to announce and implement data usage policies, and provides data subjects with technical means to keep track of their personal information. Another approach involves a peer-to-peer protocol for collaborative filtering in recommendation systems, which protects the privacy of individual data.
More recently, the use of location data has raised important privacy concerns. In context-aware computing, the Place Lab system has been proposed for a location-enhanced World Wide Web. It assumes a location infrastructure that gives users control over the degree of personal information they release. Another approach relates to the idea of hitchhiking for location-based applications that use location data collected from multiple people to infer such information as whether there is a traffic jam on a bridge. It treats the location as the primary entity of interest. Yet another solution extended the P3P to handle context-aware applications and defined a specification for representing user privacy preferences for location and time. In a conceptually similar work, another approach examined a simple classification and clearance scheme for privacy protection. Each context element of any user is assigned a classification level indicating its sensitivity and accessing users are each assigned clearance values representing levels of trust for the various elements that can be accessed. For better robustness, this approach made a list of access control schemes for specific elements, thus allowing a combination of permissions for read, write and history accesses.
An identity management system for online interactions in a pervasive environment encompassing PDAs has also been propsed. It enables the users to control what pieces of their personal information to reveal in various pre-defined situations such as interacting with a vending machine, doing bank activities, or getting a bus time table. Not only does the sensitivity of information depend on the context in a mobile system, but the context itself can also be part of the information that requires protection. There have been few attempts to implement systems that do both. One solution suggested a system in which users can define different situations and different faces for themselves, and they can decide who sees what face in which situation. Another solution involves a simulation tool to evaluate architectures parameterized by users' privacy preferences for context aware systems. Users can set their preferences to protect various types of personal information in various situations. Still other approaches focused on location privacy in pervasive environments, wherein the privacy-protecting framework is based on frequently changing pseudonyms to prevent user identification. Finally, one solution suggested the idea of Virtual Walls which allow users to control the privacy of their digital information.
Access control systems mostly deal with a lack of control over direct revelations. Since they only help users control direct access to their information and don't prevent inferences, they don't fully protect user privacy.
(3) Security Protection
Security protection handles the following aspects:                Availability (services are available to authorized users);        Integrity (free from unauthorized manipulation);        Confidentiality (only the intended user receives the information);        Accountability (actions of an entity must be traced uniquely); and        Assurance (assure that the security measures have been properly implemented).        
Therefore, security research has explored detection and prevention of many attacks including Reconnaissance, Denial-of-Service, Privilege Escalation, Data Intercept/Alternation, System Use Attacks, and Hijacking. Confidentiality protection is the area that contains most of the previous research on the inference problem. The inference problem is mostly known as a security problem that targets system-based confidentiality. Therefore, suggested solutions often deal with secure database design. There are also methods that evaluate the queries to predict any inference risks.
(4) Inference Control Solutions
Inference is commonly known as a threat to database confidentiality. Two kinds of techniques have been proposed to identify and remove inference channels. One technique is to use semantic data modeling methods to locate inference channels in the database design, and then to redesign the database in order to remove these channels. Another technique is to evaluate database queries to understand whether they lead to illegal inferences. Each technique has its own drawbacks. The former has the problem of false positives and negatives, and vulnerability to denial of service attacks. The latter can cause too much computational overhead. Besides, in a mobile social computing application they both can limit the usability of the system, because they can restrictively limit user access to information. Both techniques have been studied for statistical databases, multilevel secure databases, and general purpose databases. A few researchers have addressed this problem via data mining. Since in mobile social computing, user information and preferences are dynamic, queries need to be evaluated dynamically and the first method cannot be used in such systems.
With the development of the World Wide Web, new privacy concerns have surfaced. Most of the current work in access control for web documents relates to developing languages and techniques for XML documents. While these works are useful, additional considerations addressing the problem of indirect accesses via inference channels are required.
Classical information theory has been employed to measure the inference chance. Given two data items x and y, let H(y) denote the entropy of y and Hx(y) denote the entropy of y given x, where entropy is as defined in information theory. Then, the reduction in uncertainty of y given x is defined as follows:
      Infer    ⁡          (              x        →        y            )        =                    H        ⁡                  (          y          )                    -                        H          x                ⁡                  (          y          )                            H      ⁡              (        y        )            The value of Infer (x→y) is between 0 and 1, representing how likely it is to derive y given x. If the value is 1, then y can be definitely inferred given x. However, there are serious drawbacks to using this technique:                1. It is difficult, if not impossible, to determine the value of Hx(y); and        2. The computational complexity that is required to draw the inference is ignored—nevertheless, this formulation has the advantage of presenting the probabilistic nature of inference (i.e. inference is a relative not an absolute concept).        
Additional research has focused on techniques for anonymization. Anonymity is defined as not having identifying characteristics such as a name or description of physical appearance disclosed so that the participants remain unidentifiable to anyone outside the permitted people promised at the time of informed consent. Recently, new measures of privacy called k-anonymity and L-diversity have gained popularity. K-anonymity is suggested to manage identity inference, while L-diversity is suggested to protect both identity inference and attribute inference in databases. In a k-anonymized dataset, each record is indistinguishable from at least k−1 other records with respect to certain “identifying” attributes. These techniques can be broadly classified into generalization techniques, generalization with tuple suppression techniques, and data swapping and randomization techniques. Nevertheless, k-anonymized datasets are vulnerable to many inference attacks and collection of knowledge outside of the database and L-diversity is very limited in its assumptions about background knowledge.
Identity inferences in mobile social computing cannot be addressed by the above techniques because:                The sensitivity of user information is dynamic in nature based on the context, such as time and location;        Information such as life patterns, physical characteristics, and the quality of social relations that are not kept in the database can be inferred from information available to the user—therefore, inferences in such systems are not limited to attribute disclosures; and        Users' background knowledge (the information users learn outside the database) is a premise in many inferences.        
The present invention addresses the foregoing shortcomings by providing a system and method for protecting user privacy using social inference protection techniques.