Passwords are strings of characters that computer users enter into computer systems to authenticate themselves and gain access to resources. The computer system then checks the password entered against validation data created and stored when the user created an account or, later, whenever the user changes their password. Passwords are perhaps the most common method to authenticate users of computer systems. While “the end of passwords” has been predicted by industry pundits, the complexity and expense of other approaches has kept passwords in wide use. But password-based authentication systems have been a factor in numerous security breaches and there exists a password arms race between hackers and implementers of secure computer systems. To understand why, one must understand the various methods that have been developed for implementing password systems.
The easiest way to implement password access control on an individual user basis is to create a database with a record for each user including, among other data, the user's password as plaintext. The stored password is then compared to the password supplied whenever someone attempts to log in claiming to be that user. Attempts to guess the password can be limited by disconnecting or timing out the login session after a few failed attempts. While simple to implement, this scheme presents a serious security risk: if an attacker somehow obtains a copy of the user database including the password records, they can impersonate any user. Worse, since users often use the same or similar passwords on multiple systems, compromising a user's passwords on one system can lead to compromises on several others.
A more secure approach is to apply a cryptographic hash function to the password and store the resulting hash value in the user's database record instead of the plain text password. The file /etc/passwd in early Unix systems is an example of this approach. This approach provides greater security, but still has vulnerabilities. Cryptographic hash functions, such as MD5, SHA1, SHA2, and SHA3, are designed to be very difficult to invert, but fast to compute. So while an attacker cannot directly compute the password from the stored hash (the inverse operation), they can build a table of commonly used passwords and their corresponding hash values. After gaining access to the hashed password the attacker can simply look up the hash value in the table and recover the password. Very efficient methods, such as Rainbow Tables, have been developed to store large sets of pre-computed hash values. Such tables are available on the Internet for common password hashes, including LM hash, NTLM, MD5, and SHA1.
The pre-computed table attack can be defeated using a technique known as “salt.” Instead of hashing just the password, a salted system hashes the password along with another bit string, called the salt. This bit string is typically a random number generated at account creation and stored in the user's record along with the password. NIST recommends salt be at least 64 bits in length. Salt does not add secrecy to the password hash, since the salt value is stored in the same database record with the hash, but it breaks up the pre-computed attack because a separate table of pre-computed hashes is used for each salt value. The concept of salt was introduced with the crypt function in early Unix systems. (Password Security: A Case History, Robert Morris, Ken Thompson, Bell Laboratories, 1978.
Salted hashes are still vulnerable, however. An attacker who gains access to the password record can try many password guesses using the included salt value and test if they produce the stored hash value. Since common cryptographic hashes are designed to be fast, millions of guesses can be tested each second on modern CPUs. Worse, these hash algorithms are simple enough to be executed in parallel on the graphics processors included in many modern computers, allowing billions of guesses per second to be tested.
One defense is to use operating system access permissions to restrict access to the password database. Later Unix systems, for example, moved the password hash from /etc/passwd, which was globally readable, to /etc/shadow which was access protected. However the track record of such access protections is poor. Vulnerabilities in common operating systems allow privilege escalation, which can defeat access controls. Another defense is to encrypt the password file. But the corresponding decryption key must be present in the computer memory, and therefore subject to theft, perhaps by malware or a user who manages to gain higher access privileges. This can happen whenever the password file is in use, which is typically always.
Employing hash functions designed to consume significant computer resources can slow the rate of guessing. The simplest approach is to invoke a standard cryptographic function repeated many times. The Unix crypt function introduced this concept as well. A more modern example is PBKDF2 (Password-Based Key Derivation Function 2) developed by RSA Laboratories as PKCS #5 v2.0, also published as Internet Engineering Task Force's RFC 2898.
Algorithms like PBKDF2 can be implemented in custom integrated circuits (ASICs) using relatively few transistor gates, allowing very fast and relatively inexpensive password guessing. This approach has come to dominate mining of Bitcoin, which is based on principles very similar to PBKDF2. One way to mitigate this attack is to develop key derivation functions that in addition to requiring large computing resources also require large amounts of computer memory, thereby making custom ICs more expensive per guessed password. A software package called scrypt that has been used in several applications, including digital currencies Litecoin and Dogecoin. However ASICs for scrypt have since become available. (Zeusminer Delivers Lightning, Thunder, and Cyclone Scrypt ASICs For Litecoin And Dogecoin Mining, Caleb Chen, 2014 May 21.
In 2013, a Password Hashing Competition (PHC) was held to develop an even more resistant approach. On Jul. 20, 2015, the algorithm named Argon2 was selected as the final PHC winner, with special recognition given to four other password hashing schemes: Catena, Lyra2, yescrypt and Makwa. All these algorithms require substantial computing resources and memory by design. This approach is suitable for some applications, such as disk encryption, where a computer can be tied up for a half-second or more while the key to unlock the disk is derived.
However, using password hashes that require large amounts of computer resources can impose a significant burden on computing systems with large numbers of users. Meanwhile underworld attackers have gained access to vast computing resources through so-called botnets composed of thousands of computers that they have compromised. State actors, such as government signals intelligence organizations, can afford large server farms and huge arrays of custom integrated circuits. The best defense for users in the face of these threats is to choose long, complicated passwords or passphrases. Instructions for creating more secure passwords can be found on the Internet (e.g. the Diceware.com web site). However few uses take such precautions and once attackers gain access to a password validations database, they are usually able to recover, in plaintext form, a large fraction of the passwords.
Other conventional techniques for improving password management, and some of their limitations, include:
Password policy;
Password policy operates by encouraging or forcing users to employ stronger passwords. Often self-defeating, because users develop strategies to “get around” policy or keep passwords written down in places where they can be discovered.
Using a traditional keyed hash, such as an HMAC;
These add a small secret value to the password hashing. This secret has high value to an attacker and because it is short, it can be leaked via a software exploit, side channel attack, or stealthily purloined by an insider, any of which can destroy security.
Employing a segregated password verification server;
One alternative would be to store the password database in a segregated server, with physical protections similar to what we suggest for the restricted secret server. There are several problems with this approach. First, it would require a backup procedure for the password database separate from the normal enterprise backup system, with the backup media segregated for full security. Also, multiple password servers would typically be required for availability and reliability and they would need to have their databases synchronized, adding complexity and transmitting password data over communication channels, both of which increase security risks. The synchronization problem is more acute when organizations maintain multiple data centers and/or off site disaster recovery facilities, as the synchronization data must be transmitted over long distances while maintaining security.
Using a long list of secret salt values, assigned one at a time;
Another approach is using a long list of secret salt values, assigned one at a time as passwords are created or changed. These might be stored using security principles similar to what is suggested for the present invention. But a partial leak of the list would compromise at least some of the passwords protected by it.
Two factor authentication;
The most common suggested augmentation for password security is two-factor authentication, often using security fobs or specialized two-factor cell phone apps. These solutions are expensive to deploy and awkward for users. While their use can be mandated for employees, it is much harder to get consumers and other casual users to adopt these measures, particularly when they have many password protected accounts in their lives. Also without a secure way of handling password validation information (making the first factor insecure), two-factor authentication becomes, in effect, one factor authentication.
Certificate based authentication;
This approach uses public key cryptographic certificates and algorithms to authenticate users, but it has many of the same issues as two factor authentication.
Password manager applications;
These applications can automatically generate and store long passwords, which improve security, but the applications can be clumsy for users and present a potential target for attack and are particularly vulnerable when users employ a master password that is too short. Most people don't bother to use long passwords, even for a master password.
Blind Hashing;
Blind Hashing (BH) is described in U.S. Pat. No. 9,021,269. It was developed by TapLink, Inc. As disclosed in U.S. Pat. No. 9,021,269. Blind Hashing in operation uses a large pool of random or pseudorandom data stored in remote data centers. An enterprise using BH first creates a salted hash of a password and then transmits the salted hash to a remote site where a second salt is computed and sent back to the enterprise, which then uses the second salt to compute a second password hash. By contrast, the present invention, in one embodiment, only uses the salt to compute a password authenticator token and need not use the password at all at that stage of the process. It is essential for BH to incorporate the password in creating its index to the large random data stored in its data centers, since otherwise an attacker who compromises both the password data base and the communication link between the enterprise and the data center, perhaps by a man-in-the-middle attack, could just send the first salt, which is stored in the user record, to TapLink and recover the second salt. The second salt and the hash stored in the user record are sufficient to allow password guessing attacks with the same methods currently used successfully against conventional salted passwords. It is further worth observing that the security of remote communications over the Internet is difficult to guarantee. Such communications typically pass through many nodes, often in different countries. Compromises can be due to bugs in software at the enterprise or data center ends of the communication, defects in communication protocol standards, advances in crypto-analysis, disloyal or blackmailed employees, or demands by state actors for the insertion of back doors to allow access, which in turn may be compromised by criminals. Note that by forming what is essentially a conventional salted password hash as the first step in its process, BH creates and transmits a data element, its first hash, that, if intercepted, is itself subject to the same attacks as conventional salted password hashes.
The concept of using a large collection of random bits to cryptographically protect data goes back to the one-time tape system developed by Gilbert Vernam and his colleagues. His U.S. Pat. No. 1,310,719, issued Jul. 22, 1919, disclosed the exclusive-or (XOR) method for combining data and cipherstream bits, used in connection with teleprinter operation, in particular, where key stream is supplied by punched paper tape. The U.S. used large quantities of one-time tape “from World War II until about 1960.” Over a million and a half 8-inch reels of one-time paper tape were produced in 1952, for example, each with 100,000 random Baudot (5-bit) characters (ibid, p. 42), comprising about a terabit of random data.
The use of a large mass storage device filled with random data was explained in the “Rubber Hose” file system, originally released in 1997 by J. Assange, S. Dreyfus, and R. Weinmann. According to “How does Rubberhose work?”, archived by archive.org on 24 Nov. 2004, https://web.archive.org/web/20041124173754/http://iq.org:80/˜proff/rubberhose.org/current/src/doc/maruguide/x32.html, “when you run Rubberhose over a disk for the first time, the program writes random characters to the entire drive.” Rubberhose is also mentioned in Schneier on Security, Deniable File System, April 2006, https://www.schneier.com/blog/archives/2006/04/deniable_file_s.html
The use of common mass storage devices for storing one-time pad was described Apr. 1, 2005 in an edit to the Wikipedia article on “One-time pad”, said edit being recorded as https://en.wikipedia.org/w/index.php?title=One-time_pad&type=revision&diff=11771171&oldid=11763901. This use was illustrated the point with a photograph https://commons.wikimedia.org/wiki/File:PersonalStorageDevices.agr.jpg, that photograph being still in use in the article as of Apr. 20, 2017.