Technical Field of the Invention
The present invention relates to privacy and anonymization in computer networks, and particularly to pseudonymization methods able of providing anonymity of sensitive data, such as user data profiles, that are stored in computer networks.
Overview of the Related Art
Pseudonymization techniques can be used to provide privacy of sensitive data in data profiling networks, wherein data is dynamically acquired from various data sources and then processed, stored, and retrieved, over a period of time. Typically, a data profiling network is implemented as a computer network. Each data source provides data originating from or relating to different real-world entities called “users”. For example, users may be individuals (persons) or groups of persons, companies, organizations, Internet websites, or devices such as personal computers and mobile phones. Privacy implies that the real-world identities of users should remain hidden from the network nodes processing and storing the sensitive data. In the context of the present description and for the purposes of the present invention, a “real-world identity” of a user is defined as a set of identifiers, wherein each identifier is a description of a verifiable physical or logical property of a user, which is assumed to be valid over a period of time. A real-world identity is a unique representation of a single user or a relatively small set of users within a global domain (e.g., the world or a state) or a local domain (e.g., a company or a town).
For example, a data profiling network may be set up for on-line profiling by internet service providers (ISPs) or particular websites providing various internet services. The data profile then relates to the Internet usage by individual users and is meant to be used for providing improved or new services over the Internet, e.g., for targeted marketing by authorized entities via customized banner advertisements. In this case, the real-world identity of a user to be protected may include as identifiers, for example, a Uniform Resource Locator (URL), an email address, an IP address, a phone number, a person's name, or a residential address.
In order to derive cumulative data profiles in time for any particular user, it is intrinsically required to link together different data relating to the same user at different times. In essence, this linkability is constrained in that it relates only to the data needed for computing the data profile at a given time and not necessarily to the data profiles at different times. In order to ensure this constrained linkability, a conventional method consists in using a static pseudonym (hereinafter, a pseudonym will be also referred to as PID) in place of an identity (hereinafter also referred to as ID), where the associations between IDs and PIDs should remain hidden from the network nodes processing and storing the sensitive data.
The main problem with using static pseudonyms is that the provided linkability is unconstrained, as it is unlimited in time and also relates to the data profiles of a user at different times. The unconstrained linkability means that the data profiles at different times are linked together by the same static pseudonym and can hence be used to obtain the data profile curves in time for any targeted user, regardless of how the data profile changes in time. As a consequence, the unconstrained linkability deriving from the use of static pseudonyms results in the lack of forward/backward privacy and increases the risk of the user identity recovery by analyzing the data profile curves. No or scarce forward/backward privacy means that if the identity of a user is compromised at a given time, then the corresponding data profiles in the past and in the future are all compromised, which itself results in the full traceability of the identified user.
Since a data profile curve contains much more information than a single data profile at a given time, the risk of having someone able to recover the identity of the corresponding user increases significantly, depending on the data profile, especially if it is possible to correlate the data profile curve with real-life data. In general, this risk is thus much higher than in the case of using data profiles at single times only.
U.S. Pat. No. 7,213,032 B2 describes a computer-implemented method and system for anonymous profiling of, and targeted marketing to, anonymous users in a data network, such as the Internet. Data network is divided into three parts: the anonymous trusted part (ATP), non-anonymous part (NAP), and non-profiling part (NPP). The anonymous user profiles are computed, maintained, and used in ATP, the non-anonymous transactions requiring real-world user identities are executed within NAP, and the anonymous user profiles taken from ATP are also used within NPP. The anonymity of user profiles is ensured by assigning a unique identifier (UID) to each user in ATP and a possibly different UID in NPP. The user profiles labeled by UID are stored in a database of ATP. Users are anonymously authenticated in ATP or NPP by using self-chosen virtual user names or pseudonyms together with passwords when logging into the system. The central point of U.S. Pat. No. 7,213,032 B2 is that the user real-world identity is only used in NAP and is never revealed to any part of ATP or NPP, while the user profiles are never explicitly used in NAP. However, it is allowed that so-called “representational or tokenized transactional values” can traverse the boundary between NAP on one hand and ATP and NPP on the other. Such values are defined as “any coded information that can be generated or redeemed by a user and contains neither user profile nor user real-world identity”. Such values have an important role to connect the anonymous and non-anonymous parts of the network and thus enable the non-anonymous transactions within NAP.
U.S. Pat. No. 7,844,717 B2 discloses a method for pseudonymous exchange of private personal data associated with users between two or more data storage servers or within a single data storage server, where the privacy of users and data storage servers is protected by using pseudonyms instead of real-world identities. In the system, the users and servers are authenticated by standard methods using validated secure pseudonyms and credentials (in particular, the method from D. Chaum and J.-H. Evertse, “A secure and privacy-protecting protocol for transmitting personal information between organizations,” in Proceedings of Crypto '86, Lecture Notes in Computer Science, vol. 263, pp. 118-167, 1987).
The central point of the method is the usage of a trusted proxy server called the pseudonym server for controlling the access to private data via access control rules, in which the users and servers are registered and represented by the associated unique identifiers (UIDs) along with user and server types, respectively. The user real-world identities can be stored too.
U.S. Pat. No. 7,610,390 B2 describes a method for linking user accounts stored at different nodes in a data network such as the Internet, where each user account contains some locally unique user account identity information (ID), composed of locally chosen, possibly partial, real-world identifiers (which should be regarded as private if they uniquely specify the user) or arbitrarily chosen local user account names, auxiliary information composed of the so-called handles, and, possibly, other private data (e.g., user profiles, preferences, policies, services authorized to have access to, access control rights, etc.). There are two basic types of nodes, called identity providers and service providers. The main role of the former is to authenticate the users and, hence, the stored local IDs necessarily include real-world identifiers. The main role of the latter is to provide various services and, hence, may or may not include real-world identifiers as parts of the stored local IDs.
The service and identity nodes interact with each other and thus provide different services to network users. This interaction requires that the user accounts stored at different nodes be linked together. The role of the handles is to enable this linking without exchanging the local user account IDs. This is achieved by having the same handle being shared (as a common secret) by the two nodes communicating to each other. The same shared handle thus determines that the two user accounts correspond to the same user. Each handle corresponding to a user consists of two parts, which are respectively generated by the two nodes and sent to each other, in a possibly encrypted form. If the same node communicates to several other nodes, then the part of the handle generated by that node is the same for all the connections, i.e., it depends on the local user account rather than on the connection. In this sense, it can be called a pseudonym of the local user account at a given node. A pair of pseudonyms associated with two nodes thus determines, as a handle, the connection between the user accounts, of the same user, at the two nodes. It is further suggested that by choosing dynamic pseudonyms, i.e., the pseudonyms that change in time, “the visibility of the account name can be reduced”.
The paper of S. Fouladgar and H. Afifi, “A simple privacy protecting scheme enabling delegation and ownership transfer for RFID tags,” Journal of Communications, vol. 2, no. 6, pp. 6-13, 2007, deals with a communication protocol for mutual authentication in a system composed of RFID (radio frequency identification) tags and tag readers via a trusted on-line database. The protocol is of a challenge-response type using dynamic pseudonyms for tag authentication, where the pseudonyms are generated from pre-shared secret keys and counter-generated local nonces by using cryptographic hash or encryption functions. The tags IDs and secret keys are stored in the trusted on-line database and are only revealed by the protocol to authorized readers, while the dynamic pseudonyms ensure that the tag authentication remains untraceable by unauthorized readers.