Application software typically provides the functionality that end users want from a data processing system. The application software typically runs on top of system software. To establish a trusted operating environment, a data processing system may measure and authenticate the system software components before those components are executed. The system software components may include, without limitation, an operating system (OS) and system initialization code that is executed prior to the OS. The system initialization code may also be referred to as firmware. The system initialization code may include, without limitation, a startup authenticated code module (ACM) and a basic input/output system (BIOS). For purposes of this disclosure, the startup ACM may be referred to simply as the ACM.
The ACM is an always trusted code module that is responsible for validating the platform that it is running on and then measuring the current system BIOS. The ACM is also responsible for authenticating the BIOS and for controlling access to the memory (e.g., by locking or unlocking a memory controller). The ACM is started by the root of trust—the central processing unit (CPU). The ACM may include a value that attests to the authenticity of the ACM. That value may be encrypted with a key that corresponds to another key stored in the data processing system. Typically, the ACM and the BIOS are both programmed at the factory, and when the ACM runs for the first boot of the data processing system, the ACM measures the BIOS and stores this measurement in the system's trusted platform module (TPM). For all subsequent boots, the ACM re-measures the BIOS, and as long as the BIOS image remains the same, the ACM will find the BIOS to be authentic, because the measurement remains unchanged.
The system software may also include a virtual machine monitor (VMM). The VMM may be launched after the system initialization code but prior to the OS, for instance. An operating environment in which the system software has been measured before being launched may be referred to as a measured launch environment (MLE). If the measurements for the system software have also been authenticated, the operating environment may also be referred to as a trusted operating environment. By establishing an MLE or a trusted operating environment, the data processing system provides an assurance that the data processing system is secure and has not been compromised by malware. After the MLE has been established, the data processing system may execute the application software. When the application software in the data processing system performs mission critical functions, the user or users of the data processing system may rely on the data processing system to be available practically twenty-four hours per day, seven days per week. For example, server farms and data centers may utilize data processing systems that are expected to have less than 5.26 minutes of downtime per year. A data processing system with that degree of availability or reliability may be said to provide 99.999% (or “five nines”) availability. For purposes of this disclosure, a data processing system that has established an MLE may be said to be running an MLE, and application software may be said to be running on an MLE when that software is running on a data processing system that is running an MLE. Similarly, a data processing system that has established a trusted operating environment may be said to be running a trusted operating environment, and application software may be said to be running on a trusted operating environment when that software is running on a data processing system that is running a trusted operating environment.
Components of the MLE (e.g., the OS) and user applications can store secrets in memory. A secret is a piece of data that must not be compromised or accessed by malware (e.g., passwords, private/public key pairs, system registry values). When secrets are present in memory, the platform should only allow trusted modules to access the memory. For instance, if the system is reset with secrets in memory, then the platform should not allow an untrusted BIOS to access the memory. The ACM may ensure that an untrusted BIOS cannot access the memory by locking out the memory controller, so that the BIOS can no longer program or alter the memory subsystem.
Sometimes it becomes necessary to update the BIOS in a data processing system. As recognized herein, downtime could be reduced or avoided if it were possible to upgrade the BIOS while the OS is running without causing any adverse consequences. However, when the BIOS is updated, the measurement of the BIOS changes. Consequently, if the BIOS is upgraded while the data processing system is running an MLE, when the data processing system is subsequently reset, the upgraded BIOS may fail the authentication process. For example, if a BIOS upgrade occurs with secrets in memory and the system is then reset, the ACM will lock the memory controller out because the new BIOS has a different measurement than its predecessor.
In a conventional data processing system, when the measurement of the BIOS changes, the ACM will typically follow two policies:                1) if the MLE did not store any secrets in memory in the last boot, then there is nothing for the system to protect. Hence the ACM “auto-promotes” the new BIOS. In this auto-promotion process, the ACM computes the measurement of the new BIOS and stores that new measurement in the TPM. After the upgraded BIOS has been measured and auto-promoted, the data processing system may then be booted to an MLE.        2) If the MLE has secrets stored in memory, then on the next boot the ACM must not allow the new BIOS to access memory and potentially compromise system secrets. Consequently, the ACM locks the memory controller so that the new BIOS cannot program the memory subsystem and access memory.When the second policy applies, the BIOS may be unable to complete the boot process. For purposes of this disclosure, when a data processing system cannot complete the boot process, the data processing system may be said to be bricked. Thus, BIOS authentication failure may result in a bricked state. A bricked system is rendered useless because it can no longer boot to the OS. Moreover, to recover from this kind of failure, it may be necessary for the data processing system to receive service from the original equipment manufacturer (OEM) at the OEM's facilities. Consequently, for a data processing system that is configured to provide an MLE, a more difficult and/or time consuming process is typically used to update the BIOS. One approach is to bring the data processing system and an administrator into the same location (e.g., the OEM's factory), and for the administrator to then manually tear down the MLE prior to upgrading BIOS. Tearing down the MLE involves re-provisioning the contents of the TPM device, using a special software tool in the OEM's factory. This process ensures that the stored signatures/measurements—and the launch control policies (LCPs) that cause these measurements to be applied during the boot process—are rendered invalid, so that the newly upgraded BIOS can pass the boot process and be freshly measured for subsequent MLE launches. After the MLE has been torn down, the ACM may then measure and auto-promote the upgraded BIOS.        
Another approach is for the OEM to configure the data processing system with a secondary BIOS, in addition to the primary BIOS. The secondary BIOS may be referred to as a fail-safe BIOS. The OEM may also include a trusted platform module (TPM) in the data processing system, and the OEM may store a measurement for the fail-safe BIOS in a special storage area in the TPM. Subsequently, the ACM measures the fail-safe BIOS and compares that measurement with the value stored in the TPM. The OEM may also configure the data processing system with an ACM that automatically authenticates and launches the fail-safe BIOS based on a built-in software policy. If the ACM chooses to follow the fail-safe path, the data processing system may use the fail-safe BIOS to boot to an MLE.
However, the fail-safe BIOS can only be changed by the OEM. Furthermore, the fail-safe BIOS must be compatible with the server platform components. Consequently, if hardware changes are made to the data processing system after the data processing system has been delivered to the customer, the fail-safe BIOS may be rendered inoperative. Consequently, any subsequent attempt to upgrade the primary BIOS may result in a bricked data processing system. For instance, changing the CPU or introducing a new interconnect frequency may render the fail-safe BIOS incompatible with the platform and cause the fail-safe BIOS itself to fail to boot successfully.