Current computer systems are highly vulnerable to cyber attack. The number of attacks and the financial losses due to those attacks have risen exponentially. Despite significant investments, the situation continues to worsen; novel attacks appear with high frequency and employ increasingly sophisticated techniques. There are very few fundamental sources of the vulnerabilities exploited by cyber attackers. These attacks stem from the fact that current computer systems cannot enforce the intended semantics of their computations. In particular, they fail to systematically enforce: Memory safety, Type safety, The distinction between code and data, and Constraints on information flow and access. These properties are not systematically enforced today because they are not: Systematically captured during the design process; Formally analyzed or verified during design and implementation; Captured or enforced by common system programming languages (e.g., the C programming language); and Represented explicitly within the runtime environment of the system and therefore cannot be enforced dynamically by either hardware or software techniques.
Current system software is large and complex. Hardware architectures provide mechanisms to protect the kernel from user code, but at the same time grant to the kernel unlimited privileges (at best, a few levels of increased privilege). Consequently, a single penetration into the kernel gives the attacker unlimited access. Since the cost of switching into kernel mode is high, there is a tendency for system programmers to move increasing amounts of functionality into the kernel, making it even less trustworthy and exposing an even larger attack surface. Likewise, programming flaws can result in unintended access to kernel or increased privilege level system access.
Current computer systems lack the means to recover from attacks either by finding alternative methods for achieving their goals or by repairing the resources corrupted by the attack. They also typically lack the ability to diagnose the underlying problem and to fix the vulnerabilities that enabled the attack. Once a machine is corrupted, manual repairs by specialized personnel are required while the forensic information necessary to affect the repair is typically lacking. A particular issue is that, once a system is corrupted, it cannot be subsequently trusted, and therefore a remediation facility is difficult to implement. Finally, today's computer systems are nearly identical to one another, do not change appreciably over time, and share common vulnerabilities. A single network-based attack can therefore spread rapidly and affect a very large number of computers.
A central requirement for implementing trusted computing platforms is to validate whether a program executing on a potentially untrusted host is really the program the user thinks it is. If the host platform and its software environment are potentially compromised, the application may be compromised either through static replacement of the binaries or though linking—either statically or dynamically—the application dynamically to untrusted library functions or through dynamic code substitution at run time. What makes the validation problem particularly difficult is the compromises that occur at run time. One way to validate the execution is to verify that the results produced by the execution on a data set are essentially the same results that are produced when a validated, reference copy of the program is run on a trusted host and produces the same result with the same input data set. Such a validation is practically infeasible, as the execution of the validated program obviates the need for execution on the original, potentially untrusted host. Another approach will be to continuously monitor changes/updates to the various software components and to only permit changes that have been certified as legitimate. Unfortunately, this approach results in a fairly closed system that takes out the convenience of timely, automatic or semi-automatic updates and introduces a potential administration headache. Existing approaches to validating the execution of a program include the use of hardware support in the form of the Trusted Platform Module (TPM), the use of control flow signatures at run time, with control flow signatures derived from the contents of both register and memory locations as well as the contents of hardware instrumentation registers.
The art has proposed various means for addressing this problem. For example, software testing techniques based on control flow signatures. However, these are slow, and encryption makes this problem worse. Without encryption, the scheme suffers from an intrinsic vulnerability. Another technique employs hardware-support for debugging and validation of executed code. See, US 2008/0215920, expressly incorporated herein by reference, and discussed in more detail below. Software techniques are available for authenticating executions. These may have limited coverage, and may incur delay and increased overhead. Hybrid techniques, such as a combination of software techniques and TPM hardware to certify code before execution is also possible, but this may fail to detect run-time compromises.
Today's computing systems consist of a large number of hardware and software components. Assuming that the hardware components are certified by the vendors, the software components are the major sources of vulnerability. These software components start with the operating systems and related services, library components, utilities and the applications themselves. Any one of these software components can be compromised and can act directly or indirectly as sources of attacks that can change the execution characteristics of a user's program at run time. Examples of such compromises include the alteration of transfer vectors at run time, to transfer control to unintended functions, alteration of the binaries of the application itself (“code injection”) at run-time, call(s) to compromised system functions, and so on. The net result of any of these techniques is that the user's program does not correctly perform its intended functions, even though the users are completely oblivious to that fact. To detect such compromises, it is necessary to validate the execution of the entire program at run time, including the validation of library functions, kernel functions and utilities. Such validations ultimately lead to a trusted computing environment, where a system is composed of components that can be potentially compromised and where any compromise can be detected in a timely fashion to prevent any adverse impact of any form.
“Trusted Platform Module” is the name of a published specification detailing a secure cryptoprocessor that can store cryptographic keys that protect information, as well as the general name of implementations of that specification, often called the “TPM chip”. The TPM specification is the work of the Trusted Computing Group. The current version of the TPM specification is 1.2 Revision 103, published on Jul. 9, 2007.
The Trusted Platform Module offers facilities for the secure generation of cryptographic keys, and limitation of their use, in addition to a hardware pseudo-random number generator. It also includes capabilities such as remote attestation and sealed storage. “Remote attestation” creates a nearly unforgeable hash key summary of the hardware and software configuration. The extent of the summary of the software is decided by the program encrypting the data. This allows a third party to verify that the software has not been changed. “Binding” encrypts data using the TPM endorsement key, a unique RSA key burned into the chip during its production, or another trusted key descended from it.[3] “Sealing” encrypts data similar to binding, but in addition specifies a state in which the TPM must be in order for the data to be decrypted (unsealed).
A Trusted Platform Module can be used to authenticate hardware devices. Since each TPM chip has a unique and secret RSA key burned in as it is produced, it is capable of performing platform authentication. For example, it can be used to verify that a system seeking access is the expected system.
The Trusted Platform Module is typically part of the supporting chipset for a processor system, and thus its use typically delays execution of instructions by the processor until verification is completed. Likewise, verification occurs with respect to instructions before they are cached by the processor. Thus, while the TMP provides secure data processing, it does not address insecurities in moving instructions to the processor, and is susceptible to instruction injection type attaches, and likewise introduces significant latencies.
Generally, pushing the security down to the hardware level in conjunction with software provides more protection than a software-only solution that is more easily compromised by an attacker. However even where a TPM is used, a key is still vulnerable while a software application that has obtained it from the TPM is using it to perform encryption/decryption operations, as has been illustrated in the case of a cold boot attack.
The “Cerium” technology (Chen and Morris, “Certifying Program Execution with Secure Processors”, Proceedings of the 9th conference on Hot Topics in Operating Systems, USENIX, Volume 9, Pages: 133-138, 2003), expressly incorporated herein by reference, proposes a secure processor technology which validates cache line signature before commencement of processing. It provides a separate security co-processor, which is not integrated into main processing pipeline. Cerium computes signatures of the system software as it boots up, and the software at each stage self checks its integrity against a reference signature stored in the co-processor's non-volatile memory. Each stage also authenticates the software for the next stage. Cerium assumes the existence and use of a cache where operating system and trusted code can be kept. See, also, Cliff Wang, Malware Detection, Advances in information security, Mihai Christodorescu, Somesh Jha, Douglas Maughan, Dawn Song, Cliff Wang, Editors, Springer, 2006.
Boneh et al., “Hardware Support for Tamper-Resistant and Copy-Resistant Software”, Technical Report: CS-TN-00-97, (Stanford University, 2000), expressly incorporated herein by reference, provides a description of a hardware prototype which supports software-only taper resistant computing, with an atomic decrypt-and-execute operation.
U.S. Pat. No. 7,730,312, expressly incorporated herein by reference, provides a tamper resistant module certification authority. Software applications may be securely loaded onto a tamper resistant module (TRM) and securely deleted from the TRM. A method for determining, based at least upon an encrypted personalization data block, whether a TRM is part of a qualified set of TRM's to accept loading of an application is also provided. Thereafter, the method provides for loading the application onto the TRM only after the first step determines that the TRM is qualified to accept the loading of the application. A method is also provided for determining, based at least upon an encrypted personalization data block, whether a TRM is part of a qualified set of TRM's to accept deleting of an application. Thereafter, the method provides for deleting the application from the TRM only when the first step determines that the TRM is qualified to accept the deleting of the application.
U.S. Pat. No. 7,590,869, expressly incorporated herein by reference, provides an on-chip multi-core type tamper resistant microprocessor, which has a feature that, on the microprocessor package which has a plurality of instruction execution cores on an identical package and an ciphering processing function that can use a plurality of ciphering keys in correspondence to programs under a multi-task program execution environment, a key table for storing ciphering keys and the ciphering processing function are concentrated on a single location on the package, such that it is possible to provide a tamper resistant microprocessor in the multi-processor configuration that can realize the improved processing performance by hardware of a given size compared with the case of providing the key table and the ciphering processing function distributedly.
U.S. Pat. No. 7,739,517, expressly incorporated herein by reference, provides a secure hardware device which compares code image with a known good code image, using a co-processor separate from the processor, which halts execution of code until it is verified. Reference code or its signature is stored in secure, separate storage, but is not itself encrypted. The separate co-processor is not integrated into main processing pipeline to avoid significant delays.
U.S. Pat. No. 7,734,921, expressly incorporated herein by reference, provides a system and method for guaranteeing software integrity via combined hardware and software authentication. The system enables individual user devices to authenticate and validate a digital message sent by a distribution center, without requiring transmissions to the distribution center. The center transmits the message with an appended modulus that is the product of two specially selected primes. The transmission also includes an appended authentication value that is based on an original message hash value, a new message hash value, and the modulus. The new message hash value is designed to be the center's public RSA key; a corresponding private RSA key is also computed. Individual user devices combine a digital signet, a public modulus, preferably unique hardware-based numbers, and an original message hash to compute a unique integrity value K. Subsequent messages are similarly processed to determine new integrity values K′, which equal K if and only if new messages originated from the center and have not been corrupted.
U.S. Pat. No. 7,725,703, expressly incorporated herein by reference, provides Systems and methods for securely booting a computer with a trusted processing module (TPM). In a computer with a TPM, an expected hash value of a boot component may be placed into a platform configuration register (PCR), which allows a TPM to unseal a secret. The secret may then be used to decrypt the boot component. The hash of the decrypted boot component may then be calculated and the result can be placed in a PCR. The PCRs may then be compared. If they do not, access to the an important secret for system operation can be revoked. Also, a first secret may be accessible only when a first plurality of PCR values are extant, while a second secret is accessible only after one or more of the first plurality of PCR values has been replaced with a new value, thereby necessarily revoking further access to the first secret in order to grant access to the second secret.
U.S. Pat. No. 7,694,139, expressly incorporated herein by reference, provides a TPM for securing executable content. A software development system (SDS) executes on a computer having a TPM, and digitally signs software. The platform includes protected areas that store data and cannot be accessed by unauthorized modules. A code signing module executing in a protected area obtains a private/public key pair and a corresponding digital certificate. The SDS is configured to automatically and transparently utilize the code signing module to sign software produced by the system. End-user systems receive the certificate with the software and can use it to verify the signature. This verification will fail if a parasitic virus or other malicious code has altered the software.
U.S. Pat. No. 7,603,707, expressly incorporated herein by reference, provides a Tamper-aware virtual TPM, in which respective threads comprising a virtual TPM thread and a security-patrol threads are executed on a host processor. The host processor may be a multi-threaded processor having multiple logical processors, and the respective threads are executed on different logical processors. While the virtual TPM thread is used to perform various TPM functions, the security-patrol thread monitors for physical attacks on the processor by implementing various numerical calculation loops, wherein an erroneous calculation is indicative of a physical attack. In response to detection of such an attack, various actions can be taken in view of one or more predefined security policies, such as logging the event, shutting down the platform and/or informing a remote management entity.
U.S. Pat. No. 7,571,312, expressly incorporated herein by reference, provides methods and apparatus for generating endorsement credentials for software-based security coprocessors. A virtual manufacturer authority is launched in a protected portion of a processing system. A key for the virtual manufacturer authority is created. The key is protected by a security coprocessor of the processing system, such as a TPM. Also, the key is bound to a current state of the virtual manufacturer authority. A virtual security coprocessor is created in the processing system. A delegation request is transmitted from the processing system to an external processing system, such as a certificate authority (CA). After transmission of the delegation request, the key is used to attest to trustworthiness of the virtual security coprocessor.
U.S. Pat. No. 7,490,352, expressly incorporated herein by reference, provides systems and methods for verifying trust or integrity of executable files. The system determines that an executable file is being introduced into a path of execution, and then automatically evaluates it in view of multiple malware checks to detect if the executable file represents a type of malware. The multiple malware checks are integrated into an operating system trust verification process along the path of execution.
U.S. Pat. No. 7,490,250, expressly incorporated herein by reference, provides a system and method for detecting a tamper event in a trusted computing environment. The computer system has an embedded security system (ESS), a trusted operating system. A tamper signal is received and locked in the ESS. The trusted operating system is capable of detecting the tamper signal in the ESS.
U.S. Pat. No. 7,444,601, expressly incorporated herein by reference, provides a trusted computing platform, in which a trusted hardware device is added to the motherboard, and is configured to acquire an integrity metric, for example a hash of the BIOS memory of the computing platform. The trusted hardware device is tamper-resistant, difficult to forge and inaccessible to other functions of the platform. The hash can be used to convince users that that the operation of the platform (hardware or software) has not been subverted in some way, and is safe to interact with in local or remote applications. The main processing unit of the computing platform is directed to address the trusted hardware device, in advance of the BIOS memory, after release from ‘reset’. The trusted hardware device is configured to receive memory read signals from the main processing unit and, in response, return instructions, in the native language of the main processing unit, that instruct the main processing unit to establish the hash and return the value to be stored by the trusted hardware device. Since the hash is calculated in advance of any other system operations, this is a relatively strong method of verifying the integrity of the system. Once the hash has been returned, the final instruction calls the BIOS program and the system boot procedure continues as normal. Whenever a user wishes to interact with the computing platform, he first requests the integrity metric, which he compares with an authentic integrity metric that was measured by a trusted party. If the metrics are the same, the platform is verified and interactions can continue. Otherwise, interaction halts on the basis that the operation of the platform may have been subverted.
U.S. Pat. No. 6,938,164, expressly incorporated herein by reference, provides a system and method for allowing code to be securely initialized in a computer. A memory controller prevents CPUs and other I/O bus masters from accessing memory during a code (for example, trusted core) initialization process. The memory controller resets CPUs in the computer and allows a CPU to begin accessing memory at a particular location (identified to the CPU by the memory controller). Once an initialization process has been executed by that CPU, the code is operational and any other CPUs are allowed to access memory (after being reset), as are any other bus masters (subject to any controls imposed by the initiated code).
U.S. Pat. No. 6,070,239, expressly incorporated herein by reference, provides a system and method for executing verifiable programs with facility for using non-verifiable programs from trusted sources. The system has a class loader that prohibits the loading and execution of non-verifiable programs unless (A) the non-verifiable program resides in a trusted repository of such programs, or (B) the non-verifiable program is indirectly verifiable by way of a digital signature on the non-verifiable program that proves the program was produced by a trusted source. Verifiable architecture neutral programs are Java bytecode programs whose integrity is verified using a Java bytecode program verifier. The non-verifiable programs are generally architecture specific compiled programs generated with the assistance of a compiler. Each architecture specific program typically includes two signatures, including one by the compiling party and one by the compiler. Each digital signature includes a signing party identifier and an encrypted message. The encrypted message includes a message generated by a predefined procedure, and is encrypted using a private encryption key associated with the signing party. A digital signature verifier used by the class loader includes logic for processing each digital signature by obtaining a public key associated with the signing party, decrypting the encrypted message of the digital signature with that public key so as generate a decrypted message, generating a test message by executing the predefined procedure on the architecture specific program associated with the digital signature, comparing the test message with the decrypted message, and issuing a failure signal if the decrypted message digest and test message digest do not match.
U.S. Pat. No. 5,944,821, expressly incorporated herein by reference, provides a secure software registration and integrity assessment in a computer system. The method provides secure registration and integrity assessment of software in a computer system. A secure hash table is created containing a list of secure programs that the user wants to validate prior to execution. The table contains a secure hash value (i.e., a value generated by modification detection code) for each of these programs as originally installed on the computer system. This hash table is stored in protected memory that can only be accessed when the computer system is in system management mode. Following an attempt to execute a secured program, a system management interrupt is generated. An SMI handler then generates a current hash value for the program to be executed. In the event that the current hash value matches the stored hash value, the integrity of the program is guaranteed and it is loaded into memory and executed. If the two values do not match, the user is alerted to the discrepancy and may be given the option to update or override the stored hash value by entering an administrative password.
U.S. 2008/0215920, expressly incorporated herein by reference, provides a processor which generates a signature value indicating a sequence of executed instructions, and the signature value is compared to signature values calculated for two or more possible sequences of executed instructions to determine which instruction sequence was executed. The signature is generated via a signature generator during program execution, and is provided external to the processor via a signature message. There is, in this system, no encryption of a stored signature, nor use of a secret key. The trace message storage unit is operable to store instruction pointer trace messages and executed instruction signature messages. The trace message storage unit is also operable to store messages in at least one of an on-chip or an off-chip trace memory. The executed instruction signature unit is operable to generate a cache line content signature. The signature may be generated via a signature generator during program execution, and provided external to the processor via a signature message such as by using a trace memory or buffer and a tool scan port.
FIG. 1 of U.S. Patent Application 2008/0215920 (prior art) is a block diagram of a computer system, as may be used to practice various embodiments of the invention. A computer system 100 is in some embodiments a general-purpose computer, such as the personal computer that has become a common tool in business and in homes. In other embodiments, the computer 100 is a special purpose computer system, such as an industrial process control computer, a car computer, a communication device, or a home entertainment device. The computer comprises a processor 101, which is operable to execute software instructions to perform various functions. The memory 102 and processor 101 in further embodiments include a smaller, faster cache memory which is used to store data that is recently used, or that is believed likely to be used in the near future. The software instructions and other data are stored in a memory 102 when the computer is in operation, and the memory is coupled to the processor by a bus 103. When the computer starts, data stored in nonvolatile storage such as a hard disk drive 104 or in other nonvolatile storage such as flash memory is loaded into the memory 102 for the processor's use.
In many general purpose computers, an operating system is loaded from the hard disk drive 104 into memory and is executed in the processor when the computer first starts, providing a computer user with an interface to the computer so that other programs can be run and other tasks performed. The operating system and other executing software are typically stored in nonvolatile storage when the computer is turned off, but are loaded into memory before the program instructions can be executed. Because memory 102 is significantly more expensive than most practical forms of nonvolatile storage, the hard disk drive or other nonvolatile storage in a computerized system often stores much more program data than can be loaded into the memory 102 at any given time. The result is that only some of the program data stored in nonvolatile memory for an executing program, operating system, or for other programs stored in nonvolatile memory can be loaded into memory at any one time. This often results in swapping pieces of program code into and out of memory 102 from the nonvolatile storage 104 during program execution, to make efficient use of the limited memory that is available.
Many modern computer systems use methods such as virtual memory addresses that are mapped to physical memory addresses and paged memory to manage the limited available physical memory 102. Virtual memory allows use of a larger number of memory address locations than are actually available in a physical memory 102, and relies on a memory management method to map virtual addresses to physical memory addresses as well as to ensure that the needed data is loaded into the physical memory. Needed data is swapped into and out of physical memory as needed by loading memory in pages, which are simply large segments of addressable memory that are moved together as a group. Memory management units within the processor or chipset architecture can also change the contents of memory or cache during program execution, such as where new data is needed in memory or is predicted to be needed and the memory or cache is already full.
An executing program may complete execution of all the needed program instructions in a particular page loaded into memory, and proceed to execute more instructions stored in another page. In a typical example, the previously executing page is swapped out of memory and the page containing the newly needed program code is loaded into memory in its place, enabling the processor to continue to execute program instructions from memory. This not only complicates memory management, but complicates debugging executing software as the program code stored in any particular physical memory location might be from any number of different pages with different virtual addresses. Further, program code loaded into memory need not be stored in the same physical memory location every time, and the actual physical address into which a program instruction is stored is not necessarily unique.
US 2009/0217050, expressly incorporated herein by reference, provides systems and methods to optimize signature verification time for a cryptographic cache. Time is reduced by eliminating at least some of the duplicative application of cryptographic primitives. In some embodiments, systems and methods for signature verification comprise obtaining a signature which was previously generated using an asymmetrical cryptographic scheme, and determining whether an identical signature has previously been stored in a signature cache. If an identical signature has been previously stored in the signature cache, retrieving previously generated results corresponding to the previously stored identical signature, the results a consequence of application of cryptographic primitives of the asymmetrical cryptographic scheme corresponding to the identical signature. The results are forwarded to a signature verifier. In at least some embodiments, at least one of these functions occurs in a secure execution environment. Examples of a secure execution environment, without limitation, include an ARM TRUSTZONE® architecture, a trusted platform module (TPM), Texas Instruments' M-SHIELD™ security technology, etc. Secure execution environment comprises signature cache and at least a portion of security logic. Security logic in turn comprises signature look-up, calculator, hash function and signature verifier, although it should be readily apparent that more or different functions and modules may form part of security for some embodiments. The device obtains the signature (and message) from unsecure environment and promptly presents them to security logic for vetting. Embodiments employ signature look-up to check signature cache to determine whether the specific signature has been presented before. If the specific signature has indeed been previously presented, signature look-up retrieves the corresponding results of the previous utilization of cryptographic primitives corresponding to the relevant digital signature scheme being employed, which results were previously stored at the identified location in signature cache, and forwards the results to signature verifier. Among those results is the hash value of the previous message that is part of the previous signature. Signature verifier calls hash function to perform a hash on newly obtained message, and compares the hash value of the newly obtained message with the hash value retrieved from signature cache. If there is a match, the signature is verified and the message is forwarded for further processing, e.g., uploading into NVM or RAM as the case may be, etc. Thus, execution is commenced after verification.
Vivek Haldar, Deepak Chandra and Michael Franz, “Semantic Remote Attestation—A Virtual Machine directed approach to Trusted Computing”, USENIX Virtual Machine Research and Technology Symposium, May 2004, provides a method for using language-based virtual machines which enables the remote attestation of complex, dynamic, and high-level program properties, in a platform-independent way.
Joshua N. Edmison, “Hardware Architectures for Software Security”, Ph.D Thesis, Virginia Polytechnic Institute and State University (2006), proposes that substantial, hardware-based software protection can be achieved, without trusting software or redesigning the processor, by augmenting existing processors with security management hardware placed outside of the processor boundary. Benefits of this approach include the ability to add security features to nearly any processor, update security features without redesigning the processor, and provide maximum transparency to the software development and distribution processes.
Bryan Parno Jonathan M. McCune Adrian Perrig, “Bootstrapping Trust in Commodity Computers”, IEEE Symposium on Security and Privacy, May 2010, provides a method for providing information about a computer's state, as part of an investigation of trustworthy computing.
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the “message”, and the hash value is sometimes called the message digest or simply digest. The ideal cryptographic hash function has four main or significant properties: it is easy to compute the hash value for any given message, it is infeasible to find a message that has a given hash, (and thus there is an asymmetry between encoding and decoding), it is infeasible to modify a message without changing its hash, and, it is infeasible to find two different messages with the same hash. Cryptographic hash functions have many information security applications, notably in digital signatures, message authentication codes (MACs), and other forms of authentication. They can also be used as ordinary hash functions, to index data in hash tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption. Indeed, in information security contexts, cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values, even though all these terms stand for functions with rather different properties and purposes.
In cryptography, MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function with a 128-bit hash value. Specified in RFC 1321 (expressly incorporated herein by reference), MD5 has been employed in a wide variety of security applications, and is also commonly used to check the integrity of files. However, it has been shown that MD5 is not collision resistant; as such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property. An MD5 hash is typically expressed as a 32-digit hexadecimal number. The SHA-2 family of hash functions may also be used, which have a higher level or security.
Most cryptographic hash functions are designed to take a string of any length as input and produce a fixed-length hash value. A cryptographic hash function is ideally able to withstand cryptanalytic attack. As a minimum, it should have the following properties: Preimage resistance: Given a hash h, it should be hard to find any message m such that h=hash(m). This concept is related to that of a one-way function. Functions that lack this property are vulnerable to preimage attacks. Second preimage resistance: Given an input m1 it should be hard to find another input m2, where m1≠m2, such that hash(m1)=hash(m2). This property is sometimes referred to as weak collision resistance, and functions that lack this property are vulnerable to second preimage attacks. Collision resistance: It should be hard to find two different messages m1 and m2 such that hash(m1)=hash(m2). Such a pair is called a cryptographic hash collision, a property which is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as that required for preimage-resistance, otherwise collisions may be found by a so-called birthday attack. These properties imply that a malicious adversary cannot replace or modify the input data without changing its digest. Thus, if two strings have the same digest, one can be very confident that they are identical.
Ideally, one may wish for even stronger conditions. It should be impossible for an adversary to find two messages with substantially similar digests; or to infer any useful information about the data, given only its digest. Therefore, a cryptographic hash function should behave as much as possible like a random function while still being deterministic and efficiently computable. Checksum algorithms, such as CRC32 and other cyclic redundancy checks, are designed to meet much weaker requirements, and are generally unsuitable as cryptographic hash functions. See, en.wikipedia.org/wiki/Cryptographic_hash_function.
An important application of secure hashes is verification of message integrity. Determining whether any changes have been made to a message or file, for example, can be accomplished by comparing message digests calculated before, and after, transmission (or any other event). For this reason, most digital signature algorithms only confirm the authenticity of a hashed digest of the message to be “signed”. Verifying the authenticity of a hashed digest of the message is considered proof that the message itself is authentic. One of the main applications of a hash function is to allow the fast look-up of a data in a hash table.