Handling of sensitive strings of characters, such as credit card numbers, is often problematic. In a typical retail situation, the card is swiped at a register, and then transferred to a local server, where information about the transaction and the credit card number is stored. The information may also be stored at the registers. The information is also forwarded to servers at higher levels, such as a central server for the region, the nation etc. At all levels, it is important that enough information about the transaction is stored to render follow-up checks, audits, analysis etc. possible. However, at the same time the information stored on the servers is a security risk, and the risk is inevitably higher on the lower levels of the hierarchy. Even though the examples discussed in this application are mostly concerned with credit card numbers, similar type of problems are encountered in handling other strings of sensitive characters, such as social security numbers, driving license numbers, bank account numbers, etc. For example, social security numbers are in many systems less protected than credit card data.
The problem is often not that cryptography is not used, or used badly, but that the data itself is too weak to protect adequately—there are simply not enough possible credit card numbers, allowing an attacker routine measures to test them all using simple brute force techniques. While it may appear that a credit card number is 16 digits, and 10{circumflex over ( )}16 would be an insurmountably large number of tests, more than half of a card number is easily learned or is “guessable”. First, the last four digits of a card number are normally not required to be secured, and are in fact helpfully printed on receipts, and are permitted to be present in the stored data. Thus, these digits may reside in register printer logs, sales data, etc. If I knew four digits and were to guess all the remaining digits, I would have to make only 10{circumflex over ( )}12 guesses instead of 10{circumflex over ( )}16. Further, credit association is identified by the first digit on a credit card: “4” for Visa, “5” for Mastercard, “3” for American Express, etc. This can be used in reverse. If a credit transaction identifies the association (by printing the word VISA, for example) I know the first digit of the credit card is a 4. Combined with the last four digits, I now have to make only 10{circumflex over ( )}11 guesses. In most markets around a country, there are also often only a handful of card issuing banks that will dominate any given area. There are perhaps a dozen truly cosmopolitan large cities that have a great diversity of credit cards, but in the vast majority of e.g. American heartland cities only a few banks issue a large fraction of the cards a retailer will see; perhaps as many as 50% of cards are issued by just 10 banks or so in a given region. A retailer with a private label Visa or Mastercard will have an even easier avenue of attack. The first 6 digits of a card number are devoted to the Bank Identification Number (BIN). If 10 banks issue 50% of the Visa cards used in a geographic region, that means I have one chance in ten of correctly identifying perhaps 50% of the BINs, if know the region the card was used in. And ordinary merchant identification, such as a store number, will give me the region. Six digits is a lot to reduce the search space by: it gets me from 10{circumflex over ( )}12 to 10{circumflex over ( )}6 guesses; but with 10 possible BIN numbers to try I have to make 10{circumflex over ( )}7 guesses. In addition, the final digit of a credit card number is conventionally a check-sum digit, calculated by the so-called Luhn algorithm. Just because the check digit is computed and placed as the last digit does not mean I cannot use it to verify a test of an account number. I can generate a test case with the nine digits I know, generate six digits sequentially, and compute the check digit to recover the missing digit. Thus I start out knowing “ten” digits worth of a sixteen digit card number. I now have to test only 10{circumflex over ( )}6 digits. On a modern desktop that calculation would take 4 seconds or less. Further, there is the risk that protective algorithms present in cash register software can be obtained by thieves by the simple act of stealing a register's hard drive.
In this context, it is also to be remembered that the goal of an attacker is very different from the goals of the retailer. The attacker is not trying to do the same job as a sales auditor, and does not have to identify every specific account number from any given protective scheme with 100% accuracy. The goal of an attacker is to acquire as many account numbers as easily as possible. With a stolen database of sales information, cracking even 25% of the valid credit card numbers would yield great profits.
There are in principle three different ways to render data unreadable: 1) Two-way cryptography with associated key management processes, 2) One-way transformations including truncation and one-way cryptographic hash functions, and 3) Index tokens and pads. Two-way encryption of sensitive data is one of the most effective means of preventing information disclosure and the resultant potential for fraud. Cryptographic technology is mature and well proven. The choice of encryption scheme and topology of the encryption solution is critical in deploying a secure, effective and reasonable control. Hash algorithms are one-way functions that turn a message into a fingerprint, usually not much more than a dozen bytes long. Truncation will discard part of the input field. These approaches can be used to reduce the cost of securing data fields in situations where you do not need the data to do business and you never need the original data back again. Tokenization is the act of substituting the original data field with reference or pointer to the actual data field. This enables you to store a reference pointer anywhere within your network or database systems. This approach can be used to reduce the cost of securing data fields along with proper network segmentation in situations where you do not need the data to do business, if you only need a reference to that data.
Thus, problems that need to be addressed in secure handling of sensitive strings of characters are e.g. that you typically do not want to outsource your data, since you cannot at the same time outsource your risk and liability. Accordingly, an organization will normally not be willing to move the risk from its environment into a potentially less secure hosted environment. Further, you normally need to maintain certain information about transactions at in the point of sales (POS), as well as on higher levels. In most retail systems, there are a plurality of applications that use or store card data, from the POS to the data warehouse, as well as sales audit, loss prevention, and finance. At the same time, the system need to be adequately protected from attacks from data thieves. Still further, protective measures cannot be allowed to be complicated, cumbersome and expensive.
The US application US 2009/249082 by the same applicant and same inventor addresses some of these questions.
However, there is still a need for a tokenization method that can be performed at a local server and which requires relatively low data processing and data storage capacity, and which still provides an adequate security level. There is also a need for a tokenization method that can be installed and run on a local server to a relatively low cost.