One conventional data storage system includes two storage processors and a set of disk drives. Each storage processor has volatile semiconductor memory which, among other things, contains a local write cache. In the traditional sense, each storage processor further includes a set of microprocessors (e.g., dual microprocessors) which runs an operating system, as well as standard boot/reset subsystems such as a basic input/output system (BIOS) and a power-on self-test (POST) mechanism.
During initial power-up of the data storage system, the BIOS and the POST of each storage processor write to the volatile semiconductor memory of that storage processor. In particular, the BIOS initializes all of the memory regions and sets ups the initial error correction codes (ECCs) for the memory regions. Next, the POST utilizes a portion of the volatile semiconductor memory to carry out a series of tests, discoveries, other initializations, loads, etc.
Once the BIOS and POST have completed operation, the two storage processors run their respective operating systems to perform data storage operations on behalf of one or more external host computers. Along these lines, the two storage processors operate in an active-active manner to store data into, and retrieve data from, the set of disk drives on behalf of the external host computers. During such operation, the storage processors mirror the contents of their local write caches thus enabling the data storage system to achieve high availability of the host write data and thus safely acknowledge host write operations once the host write data reaches both write caches (i.e., a conventional write-back caching scheme).
If one storage processor of the data storage system suffers a non-recoverable failure (i.e., a failure that can only be resolved by servicing from a technician), the other storage processor may be able to continue to perform data storage operations so that the data storage system as a whole remains in operation. Specifically, when one storage processor fails, the remaining storage processor vaults the contents of its local write cache to one or more disk drives and then turns off its local write cache. From that point forward, the remaining storage processor carries out host write operations in a write-through manner in which the remaining storage processor acknowledges completion of the host write operations only after the write data is synchronized with the vault (if necessary) and stored on the set of disk drives.
In the above-described conventional data storage system, it should be understood that there may be situations in which the remaining storage processor encounters a failure (i.e., a second failure of the data storage system) from which it can recover. For instance, the remaining storage processor may suffer a software failure or a minor hardware failure which nevertheless enables the remaining storage processor to continuing to operate after re-initialization. In these situations, the remaining storage processor reboots itself by re-running the BIOS and POST. That is, the BIOS re-initializes all of the memory regions of the remaining storage processor's volatile semiconductor memory and provides new ECCs for these memory regions. Next, the POST utilizes the volatile semiconductor memory to re-perform the series of tests, discoveries, other initializations, loads, etc. Once the re-initialization process of the remaining storage processor is complete, the re-initialized storage processor can continue to perform data storage operations in the write-through manner (i.e., host write operations are acknowledged once the host write data is stored on the set of disk drives) until a technician arrives to repair the data storage system.