This invention is in the field of data security. Embodiments are more specifically directed to computational security within a processor architecture.
Security of data communications is a significant issue for virtually every type of electronic system, ranging from large-scale systems such as supercomputers to the smallest scale systems, such as embedded processors. Indeed, security is becoming the paramount issue for small scale systems such as the sensors and actuators envisioned for deployment in the “Internet of Things” (IoT). These highly distributed IoT objects, which will be implemented in large numbers over a wide range of services and applications, can be particularly vulnerable to attack and compromise, given their relatively small computational capacity and remote implementation. However, the importance of the functions carried out by a network of these sensors and actuators raises the security stakes.
Various approaches are known in the field of digital data cryptography, such as may be used for data communications, data storage and retrieval, and other applications including those that may be carried out by embedded processors. In general, the field of cryptography encompasses data encryption and decryption, digital authentication (e.g., sign/verify schemes), and the like. Public key cryptography, also referred to as asymmetric cryptography, is a commonly used type of cryptography. According to this approach, a public-private pair of “keys”, each key being a block of data or information, are generated according to a particular algorithm. The public and private keys have an inverse relationship with one another based on a generator polynomial, such that the transmitting node secures the communication using one of the keys in the pair, and the receiving node decrypts or verifies the communication using the other key. More specifically, in the data encryption context, a block of data that is encrypted using the public key can be decrypted using the private key; in the authentication context, a digital signature generated using the private key can be verified using the public key. The public and private keys are related to one another via a difficult mathematical problem (commonly referred to as a “trap-door function”), so that it is computationally difficult to determine a private key from knowledge of its corresponding known public key. The public key can thus be published, for example sent by an unsecured communication or listed in a public registry, to enable data communication data between the holder of the private key and those obtaining the public key, without realistic risk that the private key can be calculated by an attacker. The public/private key approach is generally favored because the holder of the private key need not share that key with any other party; in contrast, symmetric key approaches require both parties to know the same encryption key.
The level of security provided by a particular public key scheme corresponds generally to the length of the keys; longer key lengths increase the difficulty of deriving the private key from the public key. Conventional bit lengths for both public and private keys under such cryptography algorithms as “DH”, “DSA”, and “RSA”, range from on the order of 1024 bits to 15360 bits. Of course, the lengths of the keys can vary widely, depending on the desired security level and the available computational capacity of the encrypting and decrypting nodes. More specifically, increasing the level of security implemented in an electronic system generally comes at a cost of reduced system performance because of the increased number of machine cycles and increased memory space required for the longer keys associated with the higher security level.
This tradeoff can be relieved to a significant extent by the implementation of a hardware-based accelerator dedicated to cryptographic processing in the system processor. The SITARA AM335x processors available from Texas Instruments Incorporated are examples of embedded processors in which such dedicated hardware-based security accelerators are implemented, such as may be used in IoT applications. In this type of architecture, the computationally intensive operations involved in encrypting, decrypting, and other cryptography functions are offloaded by the central processing unit to the security accelerator. This allows the central processing unit to utilize its processing bandwidth on processing the end-user application, so that the system throughput is not significantly affected by the cryptography operations.
FIG. 1 illustrates the generalized functional architecture and operation of an example of a conventional embedded processor with a hardware-based security accelerator. In this example, central processing unit (CPU) 2 is a conventional multiple-core processor, such as an ARM CORTEX-A8 CPU as used in the SITARA AM335x processors noted above. CPU 2 executes its applications in a multi-threading environment under a conventional operating system, such as LINUX. In the example of FIG. 1, CPU 2 is running three threads T1, T2, T3 under operating system 4. As conventional in the art, the various threads T1, T2, T3 are executed by CPU 2 according to a scheduler function in operating system 4, with operating system 4 writing an identifier of the current thread being executed in thread ID register 9 of CPU 2 as shown in FIG. 1. This thread ID is managed by the hardware of CPU 2 under control of operating system 4, to prevent impersonation of one thread by another.
The conventional processor of FIG. 1 includes cryptographic co-processor 5, which operates as a hardware-based security accelerator in communication with CPU 2 over a bus SYS_BUS, and that is dedicated to the execution cryptography operations such as encrypting and decrypting of data blocks. In this architecture, cryptographic co-processor 5 carries out these operations in conjunction with secure memory block 6, which is a portion or all of a memory resource accessible to co-processor 5 via dedicated bus DED_BUS; typically, access to secure memory block 6 is restricted to co-processor 5, to the exclusion of CPU 2 and other general purpose logic and I/O functions in the embedded processor.
In this conventional architecture, cryptographic co-processor 5 and secure memory block 6 serves as a shared resource common to the various threads T1 through T3 being executed by CPU 2. As known in the art, the switching of execution from one thread to another as is often performed in a multithreading environment requires the storing of the execution “context” from the thread being paused, for later retrieval upon resumption of the execution of that thread. This context includes register contents, execution states, settings, and other information that can allow the logic circuitry to resume execution of the thread in the same place at which it left off, as though the intervening thread or threads had not handled in the meanwhile. For cryptography operations, this context includes the cryptography key being used in a particular thread. Secure memory block 6 in this conventional architecture allows fast context switching by co-processor 5 upon receiving a command to execute a cryptography operation under a different execution thread from a previous operation, by providing local storage of the cryptography context, specifically the keys used in cryptography operations by co-processor 5.
As shown in FIG. 1, multiple key entries 8 are available in secure memory block 6 for storing a number of cryptography keys (i.e., multiple thread contexts), thus allowing co-processor to execute a number of different operations and communication channels, each having their own distinct private, public, or symmetric (shared) key. In this typical conventional arrangement, each key entry 8x can be called by a key “tag” or other identifier KEY ID by way of which a particular key can be identified and referred to. The contents of entries 8 each include a KEY field for storing the key itself, or for storing an address pointer referring to another location in secure memory block 6 at which the key is stored. In some conventional architectures, a KEY ATTR field indicating the type of cryptography operations (i.e., encryption, decryption, authentication, signing) for which that key is permitted to be used, as well as the cryptography type (AES, PKA, etc.) of that key. Four key entries 81 through 84 are shown in FIG. 1, although secure memory block 6 may of course contain more or fewer entries as appropriate.
In this architecture, CPU 2 communicates with cryptographic co-processor 5 by way of bus SYS_BUS. To invoke a cryptography operation in the current execution thread, CPU 2 issues a call to co-processor 5 over bus SYS_BUS, including a command indicating the type of operation to be performed, the data on which those operations are to be performed, and a value of the key identifier KEY ID indicating the key entry 8x for the cryptography key to be used in that operation. If that key differs from the key currently being used by co-processor 5, which typically occurs if operating system 4 has switched execution from one thread to another at CPU 2, co-processor 5 carries out a context switch by storing the current key and other context at the key entry 8x in secure memory block 6 for that prior key, and retrieving the key for the key entry 8x corresponding to the newly-received KEY ID value from CPU 2. Upon completion of the desired cryptography operation, cryptographic co-processor 5 communicates the results back to CPU 2 over bus SYS_BUS.
By way of further background, conventional processor architectures are known in the art in which the cryptographic co-processor can operate in either a “public” or a “private” mode. In these architectures, the public mode may be associated with one cryptography key that can be shared among multiple threads in this mode, while the private mode is associated with a different cryptography key that can also be shared among threads in that mode.
By way of further background, conventional processor architectures known in the art carry out context switching outside of the cryptographic co-processor. In these architectures, the operating system carries out a context switch at the CPU or other processor calling the co-processor, “cleaning up” the context of the previous thread to the extent present in the co-processor. In this approach, the co-processor is not involved in the context switch other than by having its key rewritten, if desired, by the next thread.
By way of further background, in some conventional processor architectures, separate registers for public and private modes are provided in the cryptographic co-processor. These logically separated registers isolate cryptography keys and other settings and information that are used by the co-processor in the private domain from that used in the public domain, and vice versa. In this arrangement, cleanup is not required in a context switch from one domain to the other.