Modern cryptography, the practice and study of securing information, operates using algorithms which often require a large number of arithmetic computations of varying complexity. These cryptographic computations are essential to many security services such as authentication, confidentiality, and integrity.
A variety of algorithms are used to implement cryptographic functionality. Some of these algorithms contain complex arithmetic steps requiring comparatively long processing times. Conventional cryptographic algorithm acceleration methods typically attempt to accelerate one particular cryptographic algorithm at a time through specially designed hardware interfaced with software through a custom set of instructions programmed into a processor. Therefore, while conventional methods focus on a particular algorithm, conventional systems are not designed to have the general capability to accelerate multiple different algorithms using a single method or system.
Some recent research has attempted to identify generic acceleration instructions which are independent of any algorithm. However, they lack tight integration within the processing environment, and they lack the computational power for significant improvements in the computational efficiency of the algorithm.
Encryption algorithms utilize the ability to mix, or “permute,” incoming data by remapping the data, or portions thereof, to the output. Such mappings often separate the distinctive properties of diffusion and confusion to varying degrees. While conventional bit permuters which support diffusion related computations are located outside of the Arithmetic Logic Unit (“ALU”), the digital circuit that performs arithmetic and logic operations, rather than integrated within the ALU, Conventional acceleration circuits do not tightly couple and integrate both diffusion and confusion principles.
Many conventional systems maintain only one copy of each individual bit in a single location. Many acceleration strategies do this so that hardwired circuits can route multiple instances of those values to combinational logic efficiently. However, this hardwiring by its nature limits the general applicability of such circuits to a wide variety of algorithms.
Similarly, by only maintaining one copy of each bit, these systems ensure that where a bit is required for multiple calculations, the single copy of that bit may be only available to one calculation at a time, requiring the calculations to be performed in serial rather than in parallel, elongating processing time for the algorithm
Further, maintaining a single copy of each individual bit in a conventional system located without tightly integrating its location within the computational circuitry forces additional software to preparations of the data input, slowing input/output (“I/O”) processing associated with the desired computation.
By treating the required instruction set as a generic bit permuter, conventional systems offer limited capabilities and throughput speed. Finally, the lack of integration of diffusion related permutations with confusion related calculations requires the programmer to provide separate instructions to perform the actual computation increasing the input/output processing requirements, limiting the parallelization and pipe-lining potential, and wasting further time and man-hours.
Accordingly, there is a desire for a computation and data manipulation accelerator which overcomes these and other related problems.