The present invention relates generally to data tokenization.
Data tokenization is a technique used to desensitize data when the data is to be moved to a less-trusted environment. When data sets are outsourced, for example, or data, such as transaction data, is collected or aggregated for some purpose, legal constraints or security concerns often dictate the use of tokenization techniques before moving the data across borders or into untrusted environments. In particular, data to be transmitted over a network may include identifying information, such as social security numbers, bank account numbers, vehicle identification numbers or other unique identifiers which should not be revealed by the data provider. Such id data is therefore replaced by other, typically random-looking, data (the token). To preserve utility of the data as a whole, referential integrity must be maintained by the tokenization process. That is, the tokenization operation must be a deterministic process so that all occurrences of the same id data are consistently replaced by the same token.
A number of tokenization techniques have been proposed and are in commercial operation today. Typical approaches either rely on non-cryptographic methods such as substitution, perturbation or conversion tables, or use cryptographic mechanisms such as keyed hash-functions or deterministic encryption. What all approaches have in common is that they require the tokenization operation to be performed in a trusted environment, i.e., by the trusted data source itself or by a dedicated entity within the trust domain of the data source. This imposes constraints on implementation of tokenization systems. Moreover, this assumption is difficult to realize in a secure and efficient manner when data is collected from different, possibly widely-distributed data sources. Referential integrity requires tokenization operations to be consistent across all data sources, so all sources must share the same secret tokenization key or, even worse, must keep a shared and consistent version of a conversion table. A more practical approach is to concentrate the tokenization task at a central trusted entity, or TTP (trusted third party), which handles all tokenization requests. The TTP then provides a service that transforms the sensitive id data into a secure token. Current solutions require disclosure of the id data to the TTP, which makes the TTP a security and privacy bottleneck. For example, when tokenization is performed in a dynamic way in response to multiple requests and/or for multiple sources, having a single entity that can recognize and track the activities of users or other entities corresponding to the id data is clearly not desirable.