This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
The integrity of digital data can be verified using different cryptographic hash techniques such as SHA-1 and MD5. These hash techniques process an input string of arbitrary length by dividing it into blocks that are processed iteratively until a final hash value is obtained.
For example, SHA-1 outputs a 20-byte hash value (also called checksum or digest depending on the technical area) for a string of any input length. The input string or message is padded to obtain a length that is a multiple of 64 bytes. SHA-1 then uses a compression function as follows:                1—Five hash variables (U, V, X, Y, Z) are initiated to the specific constants defined in the standard FIPS 180-4.        2—For each 64-byte block of N blocks of an input message M, the algorithm applies the compression function F to an input block Mi, determining five 4-byte intermediate variables (A=ai, B=bi, C=ci, D=di, E=ei) for each block.        3—The five hash variables (U, V, X, Y, Z) contain hash blocks computed for message block Mi and are updated as follows:(U,V,X,Y,Z)←F(Mi,(U,V,X,Y,Z))+(U,V,X,Y,Z)=(A,B,C,D,E)+(U,V,X,Y,Z)Hence:(U=ui,V=vi,X=xi,Y=yi,Z=zi)←(ui-1+ai,vi-1+bi,xi-1+ci,yi-1+di,zi-1+ei) where + is an addition modulo 232         4—After repeating step 3 N times, the concatenation U|V|X|Y|Z is output as the resulting 20-byte message digest.        
The compression function F works as follows:                1—Expand the 64-byte block B into 80 words Mj of 4 bytes.        2—Copy values of (U=ui-1, V=vi-1, X=xi-1, Y=yi-1, Z=zi-1) in intermediate variables (A, B, C, D, E)                    (A, B, C, D, E)←(U, V, X, Y, Z)                        3—Perform 80 rounds (for j=0 to 79) of:                    (A, B, C, D, E)←(Mj+rot5(A)+g(B, C, D)+E+Kj, A, rot30(B), C, D)where rotn denotes left rotation by n bits, Kj are constants and the function g depends on the round number:g(R,S,T)=(R&S)^(R&T) for j in {0 . . . 19}g(R,S,T)=R⊕S⊕T for j in {20 . . . 39},{60 . . . 79}g(R,S,T)=(R&S)^(R&T)^(S&T) for j in {40 . . . 59}where , &, ^ and ⊕ denote the Boolean operators NOT, AND, OR and Exclusive OR. The digital data to be verified can be the computer code of a software application. This can ensure that the code executed by a processor has not been tampered with.                        
The process of protecting the integrity of the software application is commonly applied to the binary code once the code has been compiled. A post-build tool computes the checksum of a selected region within the binary and inserts the checksum reference value inside the binary. The checksum reference value cannot be included in the selected region as its presence impact the checksum of the selected region. Therefore, selected regions and checksum reference values are located apart; for example, selected regions are in the CODE sections, and checksum values are inserted into the DATA section.
In this example, the code region is protected by integrity, whereas regions containing checksums reference values (e.g., the DATA section) are vulnerable to tampering. To improve the protection, it is common to deploy multiple integrity verifications and to spread checksums reference values throughout the binary. The assumption of a continuous region of data on which the integrity can be computed is then no more valid, because regions, which can be overlapping, contain checksums of other blocks inserted at post-build time.
FIG. 1 illustrates a section of code with two overlapping regions and checksum reference values according to the prior art. The section of code 100 comprises two regions, region O 112 and region P 114, two checksum reference values, checksum O 124 and checksum P 122, and a key 132. As can be seen, region O comprises checksum O and checksum P, while region P comprises checksum O and the key. This means that the integrity for region O should be generated over the whole region minus checksum O and checksum P. Likewise, the integrity for region P should be generated over the whole region minus checksum O and the key. As the checksums and, often, the key are introduced after the checksum calculations, it is necessary to discard the inserted checksums and the key in subsequent checksum calculations.
Many existing commercial products offer integrity protection. Protecting an application using such products usually requires the interaction of a skilled person, who must define a specific script to protect the application. To solve the discontinuity issue in protected regions, the security expert should declare a list of excluded regions, which are subtracted from the protected region when the integrity check is performed. This means that the binary must embed a table of information that describes locations to exclude, which in itself leaks information to an attacker that is then able to list the potential locations where sensitive information and secrets can be stored.
In addition, even if the information itself (the perimeter of the protection regions, start address, length, the value of the checksum, etc.) does not contain any confidential or sensitive data, it nevertheless presents a weakness that can be exploited by a dynamic attack. With a debugger, an attacker can put hardware breakpoints on the checksum location and monitor read/write accesses to it. This way, it is easy to detect the calling integrity routines and their invocation points in order to stub them.
Today, advanced temporal monitoring tools use new, advanced technologies, like hypervisors or virtual machines to track the application stealthily, i.e., the read/write memory accesses of each function. In the previous example, the integrity routine makes no read access to the excluded locations, which renders an oracle attack possible. Each non-read address becomes suspicious and gives a hint as to where to find sensitive information such as information inserted at post-build time.
It will be appreciated that it is desired to have a solution that overcomes at least part of the prior art problems related to integrity of digital data, in particular in software applications. The present principles provide such a solution.