It is often desirable to protect application software code that is loaded from external memory and executed by processors embedded within integrated circuits. As such, certain embedded processor systems use a decryption engine and a secret key to decrypt software images that are encrypted and stored in external memory systems. For these security applications, a cryptographic algorithm according to the Advanced Encryption Standard (AES) is often used to encrypt the software image, and an AES decryption engine is then often used within the integrated circuit to decrypt the encrypted software image. AES encryption/decryption is well known and is commonly applied to provide secured protection of code and data in various environments. AES algorithms operate on 128-bit (16 byte) data blocks with either 128-bit, 192-bit, or 256-bit secret keys. Further, AES algorithms also use variable numbers of cryptographic calculation rounds depending upon the size of the secret key being used. For example, where a 128-bit secret key is used for AES encryption, data is typically processed through a series of calculations requiring ten (10) rounds to complete. Each round can perform different data transformations including: (1) byte substitution using a substitution table, (2) shifting rows of a state array by different offsets, (3) mixing data within columns of a state array, and/or (4) adding a round key to the state. The AES decryption function uses the same 128-bit secret key to reverse the encryption provided by the AES encryption function.
For secure applications with certain external memories, such as Quad-SPI (quad-serial-peripheral-interface) flash (non-volatile) memories, execute-in-place operational modes can cause difficulties with existing integrated circuit processing systems. For example, a decryption engine for such an execute-in-place operational mode may require that encrypted code be decrypted in real-time thereby allowing direct execution of code being accessed from the external memory system. However, a significant challenge for such real-time execution is the speed at which decryption is performed within the integrated circuit. An internal cryptographic system that increases latency to perform decryption will adversely affect system performance. As such, the decryption processing selected for such a decryption system can have a negative impact on overall latency for the system and thereby degrade system performance.