1. Field of the Invention
The present invention relates generally to a technique for handling an error condition in a memory device, and more specifically, to such a technique wherein at least certain of the specific routines that may be used to handle the error condition may be dynamically modified.
2. Brief Description of Related Prior Art
Network computer systems generally include a plurality of geographically separated or distributed computer nodes that are configured to communicate with each other via, and are interconnected by, one or more network communications media. One conventional type of network computer system includes a network storage subsystem that is configured to provide a centralized location in the network at which to store, and from which to retrieve data. Advantageously, by using such a storage subsystem in the network, many of the network""s data storage management and control functions may be centralized at the subsystem, instead of being distributed among the network nodes.
One type of conventional network storage subsystem, manufactured and sold by the Assignee of the subject application (hereinafter xe2x80x9cAssigneexe2x80x9d) under the tradename Symmetrix(trademark) (hereinafter referred to as the xe2x80x9cAssignee""s conventional storage systemxe2x80x9d), includes a plurality of disk mass storage devices configured as one or more redundant arrays of independent (or inexpensive) disks (RAID). The disk devices are controlled by disk controllers (commonly referred to as xe2x80x9cback endxe2x80x9d controllers/directors) that are coupled via a bus system to a shared cache memory resource in the subsystem. The cache memory resource is also coupled via the bus system to a plurality of host controllers (commonly referred to as xe2x80x9cfront endxe2x80x9d controllers/directors). The disk controllers are coupled to respective disk adapters that, among other things, interface the disk controllers to the disk devices. Similarly, the host controllers are coupled to respective host channel adapters that, among other things, interface the host controllers via channel input/output (I/O) ports to the network communications channels (e.g., SCSI, Enterprise Systems Connection (ESCON), or Fibre Channel (FC) based communications channels) that couple the storage subsystem to computer nodes in the computer network external to the subsystem (commonly termed xe2x80x9chostxe2x80x9d computer nodes or xe2x80x9chostsxe2x80x9d).
In the Assignee""s conventional storage system, the shared cache memory resource comprises a relatively large amount of synchronous dynamic random access memory (SDRAM) that is segmented into a multiplicity of cache memory regions. Each respective cache memory region may comprise, among other things, a respective memory array and a respective pair of memory region I/O controllers. The memory array comprised in a respective memory region may be configured into a plurality of banks of SDRAM devices (which each such bank comprising multiple 64, 128, or 256 megabit SDRAM integrated circuit chips) that are interfaced with the respective memory region""s I/O controllers via a plurality of respective sets of command and data interfaces.
The I/O controllers in a respective memory region perform, based upon commands received from the host and disk controllers, relatively high level control and memory access functions in the respective memory region. For example, based upon commands received from the host and disk controllers, each I/O controller in a respective memory region may perform arbitration operations with the other I/O controller in the region so as to ensure that only one of the I/O controllers in the region is permitted to be actively accessing/controlling the memory array at any given time. Additionally, each I/O controller in a respective memory region may perform address decoding operations whereby a memory address supplied to the I/O controller by a host controller or a disk controller, as part of a memory access request (e.g., a memory read or write request) from the host controller or disk controller to the I/O controller, may be decoded by the I/O controller into a physical address in the memory region""s memory array that corresponds to the address supplied by the host controller or disk controller. Other functions of the I/O controllers in a respective memory region include, among other things, temporary storage and transfer synchronization of data moving between the bus system and the memory array in the respective region, and as will described more fully below, the handling of error conditions that may arise in the memory array.
Conversely, the command and data interfaces in a respective memory region perform, based upon commands received from the I/O controllers (e.g., via command/control signal busses coupling the I/O controllers to the interfaces), relatively low level control and memory access functions in the respective memory region. For example, these interfaces may provide, in response to a memory access request supplied to the interfaces from an I/O controller, appropriate chip select, clock synchronization, memory addressing, data transfer, memory control/management, and clock enable signals to the memory devices in the memory array that permit the requested memory access to occur.
When the memory array encounters an error condition, the command and data interfaces may detect the occurrence of the error condition and may report such occurrence to the I/O controller that currently is actively accessing/controlling the memory array (hereinafter termed the xe2x80x9cactive I/O controllerxe2x80x9d). Typical error conditions that may be detected and reported by the command and data interfaces include the occurrence of parity errors in the values transmitted by the command/control signal busses, the failure of a requested directed memory access to complete within a predetermined xe2x80x9ctimeoutxe2x80x9d period, etc.
The command and data interfaces signal the occurrence of an error condition. by asserting an error signal line that is coupled to the active I/O controller. The assertion of the error signal line merely indicates that an error condition has been detected in the memory array, but does not indicate the nature or type of error condition detected. In response to the assertion of the error signal line, the active I/O controller may report the occurrence of the error condition to a host controller or disk controller that is currently seeking to access the memory array using the active I/O controller; the active I/O controller may also execute one or more error handling routines to try to determine the cause of the error condition, and to correct the error condition.
The structure and operation of the circuitry comprising a memory region are sufficiently complex that it is essentially impossible to anticipate in advance all of the possible causes of memory array.error conditions that may be reported to the I/O controllers, and to anticipate in advance how to correct all of such error conditions when they occur. In the Assignee""s conventional storage system, the error handling routines that may be executed by an I/O controller in response to the reporting of a memory array error condition are statically preprogrammed into the I/O controller. This is unfortunate, since it inherently limits the ability of the I/O controller to appropriately handle memory array error conditions that may arise from causes that were not anticipated in advance of, the initial programming of the I/O controller.
Accordingly, in broad concept, the present invention provides a technique for handling error conditions that may occur in a memory device, in which technique at least certain of the error handling routines that may be used to handle the error condition may be dynamically modified (i.e., programmed and/or changed during the operation of the controller, e.g., while the controller is being used to control the memory device). In one embodiment of the present invention, a memory controller is provided that may control a memory device (e.g., a memory array comprised in a cache memory region in a data storage system). The controller may include both a first processor and a second processor. If the memory device reports to the controller that an error condition exists in the device, either the first processor or the second processor may be selected to handle the error condition. If the first processor is selected to handle the error condition, the first processor may handle the error condition according to one or more statically preprogrammed error handling routines. Conversely, if the second processor is selected to handle the error condition, the second processor may handle the error condition according to one or more dynamically programmable error handling routines.
The controller itself may comprise a selector that selects whether the first processor or the second processor is to handle the error condition. Alternatively, a device external to the controller (e.g., a host controller, disk controller, computer device external to the network data storage system, and/or other control circuitry associated with the controller) may be used to select whether the first processor or the second processor handles the error condition.
During the operation of the controller, a computer device external to the controller may transmit to the controller a first set of instructions. When the controller receives the first set of instructions from the external computer device, the controller may forward the instructions to and cause them to be stored in RAM comprised in the controller. Alternatively, the controller may store the first set of instructions in RAM that is external to the controller. The first set of instructions may comprise the one or more error handling routines according to which the second processor may handle the error condition. That is, the second processor may access the one or more error handling routines comprised in the first set of instructions stored in the RAM, and may execute the one or more routines to handle the error condition.
After the RAM has received and stored the first set of instructions, during the operation of the controller, the external computer device may transmit to the controller a second set of instructions that differ from the first set of instructions. When the controller receives the second set of instructions from the external computer device, the controller may forward them to and cause them to be stored in the RAM. This may result in at least portions of the first set of instructions being overwritten in the RAM by the second set of instructions. The second set of instructions may comprise one or more error handling routines that differ from those comprised in the first set of instructions. The second processor may access the one or more error handling routines comprised in the second set of instructions stored in the RAM, and may execute them to handle the error condition. The execution by the second processor of the one or more error handling routines in the second set of instructions may cause the error condition to be handled by the second processor in a manner that is different from the manner in which the error condition may be handled by the second processor when the second processor executes the one or more error handling routines in the first set of instructions.
In one aspect of the present invention, a memory controller embodying features of the technique of the present invention is provided. In a second aspect of the present invention, a method of using or operating such a controller is provided. In a third aspect of the present invention, computer-readable memory comprising executable program instructions is provided. The program instructions, when executed, cause features of the present invention to be implemented.
Thus, in the present invention, the error handling routines that may be executed by the memory controller to handle a memory error condition may be dynamically modified, and the memory controller may handle the error condition in accordance with either statically preprogrammed or dynamically programmable error handling routines. Advantageously, these features of the present invention permit a memory controller made according to the present invention to be better able than the prior art to appropriately handle memory error conditions that may arise from causes that were not anticipated in advance of the initial programming of the controller. Also advantageously, these features of the present invention permit there to be greater flexibility in the manner in which memory error conditions may be dealt with, according to the present invention, compared to the prior art.
These and other features and advantages of the present invention will become apparent as the following Detailed Description proceeds and upon reference to the Figures of the Drawings, in which like numerals depict like parts, and wherein: