Information and data security is an active field of academic and industrial pursuit. With the news of exploitation of software vulnerabilities by hackers and data breaches a commonplace occurrence, it is unsurprising that many academic and professional institutions are focusing their efforts to develop tools, practices and frameworks that aim to make Information Technology (IT) eco-systems more secure against exploitative attacks from domestic and global hackers and adversaries. Cryptography provides indispensable tools to enable data security in an IT environment. The discipline of cryptography is old and established with many different techniques and processes developed over the years.
A common problem when encrypting data and fields in databases is the resulting format of the encrypted data or ciphertext. The input data or plaintext is expected to be in a certain range of values, for example alphanumeric characters or American Standard Code for Information Interchange (ASCII) numbers for databases. However, ciphertext usually consists of bytes that can have any value from 0 to the maximum possible value (i.e. 255 for an 8 bit byte). These out-of-range bytes can break existing routines that process the encrypted data assuming that it was plaintext. A related problem is the size of the plaintext. Database columns are specified for the expected plaintext size. For example, credit card numbers have at most 16 characters each consisting of the ASCII values “0” to “9”. Therefore, while developing ciphering algorithms it is desirable to develop such algorithms so that they retain the forward integrity of the IT eco-system where they are deployed by preserving the format of the data that they encrypt.
In as far as producing a cipher algorithm that preserves the format of the original data, there are many teachings available in the prior art. U.S. Publication No. 2006/0227965 A1 to Zhu et al. teaches a scheme for producing a compliant ciphertext for general syntax specification using a secure syntax compliant encryption schema and “locally iterative encryption”. In one implementation, an engine partitions a data stream into blocks, and encrypts each block iteratively until syntax compliance conditions are met. A system using the schema can utilize either stream ciphers or block ciphers in different modes. Locally iterative encryption methods are fast and remain at approximately the same speed even as the length of the data stream to be encrypted increases. Besides providing superior processing speed, the locally iterative encryption schema is said to be more robust to errors in the resulting ciphertext and in the resulting decrypted plaintext than conventional syntax compliant encryption techniques. Locally iterative encryption is secure as long as an underlying encryption cipher selected for use in the schema is secure.
U.S. Pat. No. 7,864,952 to Pauker et al. teaches a data processing system that includes format-preserving encryption and decryption engines. A string that contains characters has a specified format. The format defines a legal set of character values for each character position in the string. During encryption operations with the encryption engine, a string is processed to remove extraneous characters and to encode the string using an index. The processed string is encrypted using a format-preserving block cipher. The output of the block cipher is post-processed to produce an encrypted string having the same specified format as the original unencrypted string. During decryption operations, the decryption engine uses the format-preserving block cipher in reverse to transform the encrypted string into a decrypted string having the same format.
U.S. Publication No. 2008/0310624 A1 to Celikkan et al. teaches an encryption apparatus and method for providing an encrypted file system. The encryption apparatus and method of the illustrative embodiments uses a combination of encryption methodologies so as to reduce the amount of decryption and re-encryption that is necessary to a file in the encrypted file system in the event that the file needs to be modified. The encryption methodologies are interleaved, or alternated, with regard to each block of plaintext. In one illustrative embodiment, Plaintext Block Chaining (PBC) and Cipher Block Chaining (CBC) encryption methodologies are alternated for encrypting a sequence of blocks of data. The encryption of a block of plaintext is dependent upon the plaintext or a cipher generated for the plaintext of a previous block of data in the sequence of blocks of data so that the encryption is more secure than known Electronic Code Book encryption methodologies.
U.S. Pat. No. 8,307,206 to Ahuja et al. teaches a scheme of cryptographic policy enforcement where objects can be extracted from data flows captured by a capture device. In one embodiment, the invention includes assigning to each captured object a cryptographic status based on whether the captured object is encrypted. In one embodiment, the invention further includes determining whether the object violated a cryptographic policy using the assigned cryptographic status of the object.
U.S. Pat. No. 8,605,897 to Golic teaches a symmetric-key encryption method for transforming a sequence of plaintext symbols into a sequence of ciphertext symbols, includes an iterative encryption process including: computing an altered current internal state by combining a current internal state with a current memory symbol; computing a next internal state from the altered current internal state; generating a key-stream symbol from the next internal state; verifying whether the generated key-stream symbol satisfies a condition related to data-format/syntax rules; iteratively computing next internal states and iteratively generating key-stream symbols; and iteratively encrypting plaintext symbols by employing next key-stream symbols to obtain the sequence of ciphertext symbols.
As will be known to persons skilled in the art that there are many existing cipher algorithms that can operate in block or stream mode to encrypt and decrypt data. One such popular scheme is a block-cipher running in Counter (CTR) mode as depicted in the encryption mechanism 10 and decryption mechanism 20 of prior art FIG. 1 and FIG. 2 respectively. Encryption mechanism 10 initially combines a nonce 12 with a counter 14, and uses encryption 16 to encrypt this combination with a cryptographic key as shown to produce a key-stream block, which is then Exclusively OR′ed (XOR′ed) with successive bytes of plaintext data stream 18 to produce ciphertext 20. Conversely, decryption mechanism 20 combines nonce 12 with counter 14, and uses cryptographic encryption 16 to produce a key-stream that is XOR′ed with ciphertext 20 to retrieve original plaintext data 18.
A shortcoming of the prior art teachings is that they do not allow encrypting and decrypting data in a random-access or non-linear fashion. The prior art teachings do not allow for a predetermined selection of data from amongst an entire dataset prior to the ciphering process. Such a scheme would have the benefit that a ciphering engine will not need to encrypt and decrypt the entire dataset thereby resulting in performance improvement and streamlining of IT processes. Furthermore, teachings of the prior art fail to show a mechanism that can take multi-byte values of input plaintext data, where those multi-byte values may or may not be contiguous, and encrypt them into ciphertext or conversely take ciphertext data and decrypt it into corresponding potentially non-contiguous, multi-byte values of plaintext data. Such a scheme would have the benefit of encoding strings of characters or numbers that have special meanings in the context of specific industrial applications and where validation checks downstream from the cipher would preclude the existence of ‘invalid’ combinations of such string of characters or numbers.