Network computer systems generally include a plurality of geographically separated or distributed computer nodes that are configured to communicate with each other via, and are interconnected by, one or more network communications media. One conventional type of network computer system includes a network storage subsystem that is configured to provide a centralized location in the network at which to store, and from which to retrieve data. Advantageously, by using such a storage subsystem in the network, many of the network's data storage management and control functions may be centralized at the subsystem, instead of being distributed among the network nodes.
One type of conventional network storage subsystem, manufactured and sold by the Assignee of the subject application (hereinafter “Assignee”) under the tradename Symmetrix™ (hereinafter referred to as the “Assignee's conventional storage system”), includes a plurality of disk mass storage devices configured as one or more redundant arrays of independent (or inexpensive) disks (RAID). The disk devices are controlled by disk controllers (commonly referred to as “back end” controllers/directors) that store user data in, and retrieve user data from a shared cache memory resource in the subsystem. A plurality of host controllers (commonly referred to as “front end” controllers/directors) may also store user data in and retrieve user data from the shared cache memory resource. The disk controllers are coupled to respective disk adapters that, among other things, interface the disk controllers to the disk devices. Similarly, the host controllers are coupled to respective host channel adapters that, among other things, interface the host controllers via channel input/output (I/O) ports to the network communications channels (e.g., SCSI, Enterprise Systems Connection (ESCON), and/or Fibre Channel (FC) based communications channels) that couple the storage subsystem to computer nodes in the computer network external to the subsystem (commonly termed “host” computer nodes or “hosts”).
In the Assignee's conventional storage system, the shared cache memory resource may comprise a plurality of memory circuit boards that may be coupled to an electrical backplane in the storage system. The cache memory resource is a semiconductor memory, as distinguished from the disk storage devices also comprised in the Assignee's conventional storage system, and each of the memory boards comprising the cache memory resource may be populated with, among other things, relatively high-speed synchronous dynamic random access memory (SDRAM) integrated circuit (IC) devices for storing the user data. The shared cache memory resource may be segmented into a multiplicity of cache memory regions. Each of the regions may, in turn, be segmented into a plurality of memory segments.
In order to enhance the fault tolerance of the cache memory resource, it has been proposed to configure the cache memory resource to implement a conventional “dual write” fault tolerance scheme. According to this scheme, the cache memory resource is partitioned into two halves: a first half and a second half, respectively, with the total user data space being divided evenly between the two halves; each time user data is written into the cache memory resource (e.g., by a host controller or disk controller), one copy of that data is written to a portion of the first half of the cache memory, and a duplicate (i.e., redundant) copy of the user data is written to a corresponding portion of the second half of the cache memory. Thus, according to this scheme, the data stored in the first half of the cache memory exactly mirrors the data stored in the second half of the cache memory. In the absence of a failure of a portion of the cache memory resource, all requests to read user data from the resource may return data from the first half of the cache memory. However, if a portion of the first half of the cache memory fails, and it is desired to read user data that was stored in the failed portion, the user data may instead be read from the portion in the second half of the cache memory that corresponds to the failed portion.
In this proposed fault tolerance technique, at most only one half of the total user data space in the cache memory resource may be actually available for storing user data, since the second half of the cache memory resource is reserved solely for storing a redundant copy of the user data stored in the first half of the cache memory resource. This undesirably decreases the amount of the cache memory resource that is actually available for storing user data. Accordingly, it would be desirable to provide a cache memory fault tolerance technique that permits more of the cache memory resource to be actually available for storing user data than is possible in the prior art dual write technique.