As is known in the art, large host computers and servers (collectively referred to herein as “host computer/servers”) require large capacity data storage systems. These large computer/servers generally include data processors, which perform many operations on data introduced to the host computer/server through peripherals including the data storage system. The results of these operations are output to peripherals, including the storage system.
One type of data storage system is a magnetic disk storage system having a bank of disk drives. The bank of disk drives and the host computer/server are coupled together through a system interface. The interface includes “front end” or host computer/server controllers (or storage processors) and “back-end” or disk controllers (or storage processors). The interface operates the storage processors in such a way that they are transparent to the host computer/server. That is, user data is stored in, and retrieved from, the bank of disk drives in such a way that the host computer/server merely thinks it is operating with its own local disk drive. One such system is described in U.S. Pat. No. 5,206,939, entitled “System and Method for Disk Mapping and Data Retrieval”, inventors Moshe Yanai, Natan Vishlitzky, Bruno Alterescu and Daniel Castel, issued Apr. 27, 1993, and assigned to the same assignee as the present invention.
As described in such U.S. patent, the interface may also include, in addition to the host computer/server storage processors and disk storage processors, a user data semiconductor global cache memory accessible by all the storage processors. The cache memory is a semiconductor memory and is provided to rapidly store data from the host computer/server before storage in the disk drives, and, on the other hand, store data from the disk drives prior to being sent to the host computer/server. The cache memory being a semiconductor memory, as distinguished from a magnetic memory as in the case of the disk drives, is much faster than the disk drives in reading and writing data. As described in U.S. Pat. No. 7,136,959 entitled “Data Storage System Having Crossbar Packet Switching Network”, issued Nov. 14, 2006, inventor William F. Baxter III, assigned to the same assignee as the present invention, the global cache memory may be distributed among the service processors.
As is also known in the art, it is desirable to maximize user data transfer through the interface including maximized packet transfer through the packet switching network. In fabric based storage system where DMA data pipes are used to facilitate data movement between fabric nodes, there needs to be mechanisms in place to handle fabric congestion and in severe cases handle events as a result of a failed fabric node.
In PCI-E/SRIO bridge ASIC, the fabric in question is SRIO, but the method is protocol agnostic.
As described in the above-identified pending U.S. patent application Ser. No. 11/769,744 there are eight parallel DMA data pipes within the PCI-E/SRIO bridge ASIC. A ring manager (i.e., a controller) controls the data pipes. Each DMA data pipe can post a number of sequential requests to the fabric (from 1-8) and since eight IO's (i.e., data transfers) can be active at once, this allows for up to 64 read requests to be outstanding. Up to 64 write requests can be outstanding per fabric if mirroring is enabled on all pipes.
Normally, when a packet request is launched from a data pipe a packet response including Sequence ID is returned from the receiving node back to the data pipe which initiated the request. In the event of fabric error or failure, the data pipe encounters a fatal error and the data pipe generates an error interrupt to the Ring Manager before shutting off.
Typically when the ring manager generates a response descriptor for a known good IO (i.e., data transfer), it frees the data pipe for next request descriptor. In the case of an error, the ring manager generates the appropriate error responses and places them on the response ring. In this case, because of the fabric errors, it not safe to free up the data pipe. Because of the error fabric congestion and backlog may initially exist but then free up. There is a window of time, where it is possible for rogue RIO packets to return to data pipe after the fatal error occurs. If the data pipe has been re-used for a new IO by the ring manager during this time, rogue errors will occur and corrupt the data transfer. The method used here to graceful recovery is to retire the pipe for a period of time to mask out potential rogue errors coming back from the fabric. This mechanism implemented is called Pipe Retirement. During Pipe Retirement, it is expected that all outstanding Sequence IDs will be returned to data pipe and will be matched with Sequence IDs stored in its local buffer at request time. Once Pipe Retirement expires it is expected that Rogue packets will be returned to the data pipe. The ring manager will not use the pipe until pipe retirement time expires. A pipe retirement time is selectable by programming a Pipe Retirement Seed register. The Pipe retirement number should be set to ˜10× the RIO Response timeout value.
Thus, in accordance with the invention, a method is provided for transmitting user data from a selected one of a plurality of data pipes. The method includes: selecting under control of a controller, one of the data pipes from a pool of the data pipes for transmission of the user data; transmitting from the selected one of the data pipes the user data; detecting whether there was an error in the transmission; if there an error detected, generates in the selected one of the data pipes an error interrupt for the controller; removing, under control of the controller, the selected one of the data pipes from the pool of data pipes for a predetermined period of time and when the time has expired, having the controller return the selected data pipe to the pool of available data pipes.
In one embodiment, the method includes having a ring manager select one of the data pipes from a pool of the data pipes for transmission of the user data. The data is transmitted from the selected one of the data pipes at least one packet switching network. The data pipe detects whether there was an error in the transmission. If there an error detected, the data pipe generates an error interrupt for the ring manager. The ring manager detects the error interrupt and generates an error interrupt for a CPU. The ring manager removes the selected one of the data pipes from the pool of data pipes for a predetermined period of time while the ring manager continues to work on other tasks until the time has expired. When the time has expired, the ring manager returns the selected data pipe to the pool of available data pipes.
In one embodiment, a method is provided for controlling user data transmission data between a host computer/server and a bank of disk drives through a system interface, such system interface having: a plurality of storage processors, one portion of the storage processors having a user data port coupled to the host computer/server and another portion of the storage processors having a user data port coupled to the bank of disk drives; and, a packet switching network coupled to the plurality of storage processors for passing packets between the plurality storage processors, each one of the plurality of storage processors comprising: a CPU section; a data pipe section coupled between the user data port and the packet switching network, such data pipe section comprising: a plurality of data pipes, and a data pipe controller responsive to descriptors produced by the CPU section for controlling the plurality of data pipes, user data at the user data port passing through the data pipes, each one of the plurality of data pipes comprising a pool of available data pipes controlled by the ring manager. The method includes: selecting, under control of the ring manager, one of the data pipes from the pool of data pipes for transmission of the user data; transmitting the user data from the selected one of the data pipes; detecting in the selected one of the data pipes whether there was an error in the transmission; if there an error detected, generating in the selected one of the data pipes an error interrupt for the ring manager; detecting in the ring managers the error interrupt and generating in the ring manager an error interrupt for the CPU; removing, under control of the ring manager, the selected one of the data pipes from the pool of data pipes for a predetermined period of time while the ring manager continues to work on other tasks until the time has expired; when the time has expired, returning, under control of the ring manager, the selected data pipes to the pool of available data pipes.
In one embodiment, when the ring manager detects the error interrupt the ring manger generates an error interrupt for the CPU.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
Like reference symbols in the various drawings indicate like elements.