1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and apparatus for controlling data flow. Still, more particularly, the present invention relates to a method, apparatus, and computer instructions for controlling data flows in distributed storage systems.
2. Description of Related Art
Over the last several years, significant changes have occurred on how persistent storage devices are attached to computer systems. With the introduction of Storage Area Networks (SANS) and Network Attached Storage (NAS) technologies, storage devices have evolved from locally attached, low capability, passive devices to remotely attached, high capability, active devices that are capable of deploying vast file systems and file sets. (These remotely attached intelligent storage devices are referred to as “storage servers”. The computer system to which they are attached is referred to as the “host”).
But as the storage infrastructure becomes more distributed and intelligent, it becomes much more difficult to coordinate the actions of the disparate systems. In particular, controlling data flows through the system is problematic. For example, the storage server may want to hold off data transmissions of a particular type from the host while it does some critical functions, such as synchronizing the state of its components and synchronizing the state of the data. Today's state-of-the art is that the storage server simply tells the host it is “busy”. This “busy” state is really the most primitive of flow control mechanisms. While the storage server is “busy” the host cannot send data. The host waits until the “busy” is turned off by the storage server and then resumes data transmission.
Problems arise because the host cannot tell if the storage server is really busy or dead. In the busy interval, the host is receiving application requests to access the storage serviced by the storage server. These requests cannot be held indefinitely so the host waits a certain amount of time and then assumes the storage server is dead. The amount of time that the host waits and the amount of time the storage server can be busy is not coordinated so erroneous assumptions about the state of the storage server at the host occur. This situation causes the host to fail data transfers that it should not. These failures, in turn, cause host applications, such as file systems, data bases and logical volume managers, to make erroneous assumptions about the state of the storage. All of the above cause severe recovery problems throughout the storage software stack when the “dead” storage server comes back to life.
Thus, it would be advantageous to have an improved method, apparatus, and computer instructions for controlling data flows in a distributed storage system, such as those between hosts and storage servers.