As is known in the art, large host computers and servers (collectively referred to herein as “host computer/servers”) require large capacity data storage systems. These large computer/servers generally include data processors, which perform many operations on data introduced to the host computer/server through peripherals including the data storage system. The results of these operations are output to peripherals, including the storage system.
One type of data storage system is a magnetic disk storage system. Here a bank of disk drives and the host computer/server are coupled together through an interface. The interface includes “front end” or host computer/server controllers (or directors) and “back-end” or disk controllers (or directors). The interface operates the controllers (or directors) in such a way that they are transparent to the host computer/server. That is, data is stored in, and retrieved from, the bank of disk drives in such a way that the host computer/server merely thinks it is operating with its own local disk drive. One such system is described in U.S. Pat. No. 5,206,939, entitled “System and Method for Disk Mapping and Data Retrieval”, inventors Moshe Yanai, Natan Vishlitzky, Bruno Alterescu and Daniel Castel, issued Apr. 27, 1993, and assigned to the same assignee as the present invention.
As described in such U.S. Patent, the interface may also include, in addition to the host computer/server controllers (or directors) and disk controllers (or directors), addressable cache memories. The cache memory is a semiconductor memory and is provided to rapidly store data from the host computer/server before storage in the disk drives, and, on the other hand, store data from the disk drives prior to being sent to the host computer/server. The cache memory being a semiconductor memory, as distinguished from a magnetic memory as in the case of the disk drives, is much faster than the disk drives in reading and writing data.
The host computer/server controllers, disk controllers and cache memory are interconnected through a backplane printed circuit board. More particularly, disk controllers are mounted on disk controller printed circuit boards. The host computer/server controllers are mounted on host computer/server controller printed circuit boards. And, cache memories are mounted on cache memory printed circuit boards. The disk directors, host computer/server directors, and cache memory printed circuit boards plug into the backplane printed circuit board. In order to provide data integrity in case of a failure in a director, the backplane printed circuit board has a pair of buses. One set the disk directors is connected to one bus and another set of the disk directors is connected to the other bus. Likewise, one set the host computer/server directors is connected to one bus and another set of the host computer/server directors is directors connected to the other bus. The cache memories are connected to both buses. Each one of the buses provides data, address and control information.
The arrangement is shown schematically in FIG. 1.
Thus, the use of two buses B1, B2 provides a degree of redundancy to protect against a total system failure in the event that the controllers or disk drives connected to one bus, fail. Further, the use of two buses increases the data transfer bandwidth of the system compared to a system having a single bus. Thus, in operation, when the host computer/server 12 wishes to store data, the host computer 12 issues a write request to one of the front-end directors 14 (i.e., host computer/server directors) to perform a write command. One of the front-end directors 14 replies to the request and asks the host computer 12 for the data. After the request has passed to the requesting one of the front-end directors 14, the director 14 determines the size of the data and reserves space in the cache memory 18 to store the request. The front-end director 14 then produces control signals on one of the address memory busses B1, B2 connected to such front-end director 14 to enable the transfer to the cache memory 18. The host computer/server 12 then transfers the data to the front-end director 14. The front-end director 14 then advises the host computer/server 12 that the transfer is complete. The front-end director 14 looks up in a Table, not shown, stored in the cache memory 18 to determine which one of the back-end directors 20 (i.e., disk directors) is to handle this request. The Table maps the host computer/server 12 addresses into an address in the bank 14 of disk drives. The front-end director 14 then puts a notification in a “mail box” (not shown and stored in the cache memory 18) for the back-end director 20, which is to handle the request, the amount of the data and the disk address for the data. Other back-end directors 20 poll the cache memory 18 when they are idle to check their “mail boxes”. If the polled “mail box” indicates a transfer is to be made, the back-end director 20 processes the request, addresses the disk drive in the bank 22, reads the data from the cache memory 18 and writes it into the addresses of a disk drive in the bank 22.
When data is to be read from a disk drive in bank 22 to the host computer/server 12 the system operates in a reciprocal manner. More particularly, during a read operation, a read request is instituted by the host computer/server 12 for data at specified memory locations (i.e., a requested data block). One of the front-end directors 14 receives the read request and examines the cache memory 18 to determine whether the requested data block is stored in the cache memory 18. If the requested data block is in the cache memory 18, the requested data block is read from the cache memory 18 and is sent to the host computer/server 12. If the front-end director 14 determines that the requested data block is not in the cache memory 18 (i.e., a so-called “cache miss”) and the director 14 writes a note in the cache memory 18 (i.e., the “mail box”) that it needs to receive the requested data block. The back-end directors 20 poll the cache memory 18 to determine whether there is an action to be taken (i.e., a read operation of the requested block of data). The one of the back-end directors 20 which poll the cache memory 18 mail box and detects a read operation reads the requested data block and initiates storage of such requested data block stored in the cache memory 18. When the storage is completely written into the cache memory 18, a read complete indication is placed in the “mail box” in the cache memory 18. It is to be noted that the front-end directors 14 are polling the cache memory 18 for read complete indications. When one of the polling front-end directors 14 detects a read complete indication, such front-end director 14 completes the transfer of the requested data which is now stored in the cache memory 18 to the host computer/server 12.
The use of mailboxes and polling requires time to transfer data between the host computer/server 12 and the bank 22 of disk drives thus reducing the operating bandwidth of the interface.
Referring now to FIG. 2, a data storage system 100 is shown for transferring data between a host computer/server 120 and a bank of disk drives 140 through a system interface 161. The data storage system 100 is described in co-pending U.S. Patent application Serial No. US 2000 054828 filed Mar. 31, 2000, the entire subject matter thereof being incorporated herein by reference. Reference is also made to corresponding Great Britain Patent GB 2366425 published Mar. 6, 2002.
The system interface 161 includes: a plurality of, here 32 front-end directors 1801–18032 coupled to the host computer/server 120 via ports-12332; a plurality of back-end directors 2001–20032 coupled to the bank of disk drives 140 via ports 12333–12364; a data transfer section 240, having a global cache memory 220, coupled to the plurality of front-end directors 1801–18016 and the back-end directors 2001–20016; and a messaging network 260, operative independently of the data transfer section 240, coupled to the plurality of front-end directors 1801–18032 and the plurality of back-end directors 2001–20032, as shown. The front-end and back-end directors 1801–18032, 2001–20032 are functionally similar and include a microprocessor (μP) 299 (i.e., a central processing unit (CPU) and RAM), a message engine/CPU controller 314 and a data pipe 316 described in detail in the co-pending patent application. Suffice it to say here, however, that the front-end and back-end directors 1801–18032, 2001–20032 control data transfer between the host computer/server 120 and the bank of disk drives 140 in response to messages passing between the directors 1801–18032, 2001–20032 through the messaging network 260. The messages facilitate the data transfer between host computer/server 120 and the bank of disk drives 140 with such data passing through the global cache memory 220 via the data transfer section 240. More particularly, in the case of the front-end directors 1801–18032, the data passes between the host computer to the global cache memory 220 through a data pipe 221 in the front-end directors 1801–18032 and the messages pass through the message engine/CPU controller 223 in such front-end directors 1801–18032. In the case of the back-end directors 2001–20032 the data passes between the back-end directors 2001–20032 and the bank of disk drives 140 and the global cache memory 220 through the data pipe 221 in the back-end directors 2001–20032 and again the messages pass through the message engine/CPU controller 223 in such back-end director 2001–20032.
With such an arrangement, the cache memory 220 in the data transfer section 240 is not burdened with the task of transferring the director messaging. Rather the messaging network 260 operates independent of the data transfer section 240 thereby increasing the operating bandwidth of the system interface 161.
In operation, and considering first a read request by the host computer/server 120 (i.e., the host computer/server 120 requests data from the bank of disk drives 140), the request is passed from one of a plurality of, here 32, host computer processors 1211–12132 in the host computer 120 to one or more of the pair of the front-end directors 1801–18032 connected to such host computer processor 1201–12032. (It is noted that in the host computer 120, each one of the host computer processors 1201–12032 is coupled to here a pair (but not limited to a pair) of the front-end directors 1801–18032, to provide redundancy in the event of a failure in one of the front end-directors 1811–18132 coupled thereto. Likewise, the bank of disk drives 140 has a plurality of, here 32, disk drives 1401–14032, each disk drive 1401–14032 being coupled to here a pair (but not limited to a pair) of the back-end directors 2001–20032, to provide redundancy in the event of a failure in one of the back-end directors 2001–20032 coupled thereto). Each front-end director 1801–18032 includes a microprocessor (μP) 225 (i.e., a central processing unit (CPU) and RAM) and described in detail in the above-referenced co-pending patent application. Suffice it to say here, however, that the microprocessor 225 makes a request for the data from the global cache memory 220. The global cache memory 220 has a resident cache management table, not shown. Every director 1801–18032, 2001–20032 has access to the resident cache management table and every time a front-end director 1801–18032 requests a data transfer, the front-end director 1801–18032 must query the global cache memory 220 to determine whether the requested data is in the global cache memory 220. If the requested data is in the global cache memory 220 (i.e., a read “hit”), the front-end director 1801–18032, more particularly the microprocessor 225 therein, mediates a DMA (Direct Memory Access) operation for the global cache memory 220 and the requested data is transferred to the requesting host computer processor 1201–12032.
If, on the other hand, the front-end director 1801–18032 receiving the data request determines that the requested data is not in the global cache memory 220 (i.e., a “miss”) as a result of a query of the cache management table in the global cache memory 220, such front-end director 1801–18032 concludes that the requested data is in the bank of disk drives 140. Thus the front-end director 1801–18032 that received the request for the data must make a request for the data from one of the back-end directors 2001–20032 in order for such back-end director 2001–20032 to request the data from the bank of disk drives 140. The mapping of which back-end directors 2001–20032 control which disk drives 1401–14032 in the bank of disk drives 140 is determined during a power-up initialization phase. The map is stored in the global cache memory 220. Thus, when the front-end director 1801–18032 makes a request for data from the global cache memory 220 and determines that the requested data is not in the global cache memory 220 (i.e., a “miss”), the front-end director 1801–18032 is also advised by the map in the global cache memory 220 of the back-end director 2001–20032 responsible for the requested data in the bank of disk drives 140. The requesting front-end director 1801–18032 then must make a request for the data in the bank of disk drives 140 from the map designated back-end director 2001–20032. This request between the front-end director 1801–18032 and the appropriate one of the back-end directors 2001–20032 (as determined by the map stored in the global cache memory 200) is by a message which passes from the front-end director 1801–18032 through the message network 260 to the appropriate back-end director 2001–20032. It is noted then that the message does not pass through the global cache memory 220 (i.e., does not pass through the data transfer section 240) but rather passes through the separate, independent message network 260. Thus, communication between the directors 1801–18032, 2001–20032 is through the message network 260 and not through the global cache memory 220. Consequently, valuable bandwidth for the global cache memory 220 is not used for messaging among the directors 1801–18032, 2001–20032.
Thus, on a global cache memory 220 “read miss”, the front-end director 1801–18032 sends a message to the appropriate one of the back-end directors 2001–20032 through the message network 260 to instruct such back-end director 2001–20032 to transfer the requested data from the bank of disk drives 140 to the global cache memory 220. When accomplished, the back-end director 2001–20032 advises the requesting front-end director 1801–18032 that the transfer is accomplished by a message, which passes from the back-end director 2001–20032 to the front-end director 1801–18032 through the message network 260. In response to the acknowledgement signal, the front-end director 1801–18032 is thereby advised that such front-end director 1801–18032 can transfer the data from the global cache memory 220 to the requesting host computer processor 1211–12132 as described above when there is a cache “read hit”.
It should be noted that there might be one or more back-end directors 2001–20032 responsible for the requested data. Thus, if only one back-end director 2001–20032 is responsible for the requested data, the requesting front-end director 1801–18032 sends a uni-cast message via the message network 260 to only that specific one of the back-end directors 2001–20032. On the other hand, if more than one of the back-end directors 2001–20032 is responsible for the requested data, a multi-cast message (here implemented as a series of uni-cast messages) is sent by the requesting one of the front-end directors 1801–18032 to all of the back-end directors 2001–20032 having responsibility for the requested data. In any event, with both a uni-cast or multi-cast message, such message is passed through the message network 260 and not through the data transfer section 240 (i.e., not through the global cache memory 220).
Likewise, it should be noted that while one of the host computer processors 1201–12032 might request data, the acknowledgement signal may be sent to the requesting host computer processor 1201 or one or more other host computer processors 1201–12032 via a multi-cast (i.e., sequence of uni-cast) messages through the message network 260 to complete the data read operation.
Considering a write operation, the host computer 120 wishes to write data into storage (i.e., into the bank of disk drives 140). One of the front-end directors 1801–18032 receives the data from the host computer 120 and writes it into the global cache memory 220. The front-end director 1801–18032 then requests the transfer of such data after some period of time when the back-end director 2001–20032 determines that the data can be removed from such cache memory 220 and stored in the bank of disk drives 140. Before the transfer to the bank of disk drives 140, the data in the cache memory 220 is tagged with a bit as “fresh data” (i.e., data which has not been transferred to the bank of disk drives 140, that is data which is “write pending”). Thus, if there are multiple write requests for the same memory location in the global cache memory 220 (e.g., a particular bank account) before being transferred to the bank of disk drives 140, the data is overwritten in the cache memory 220 with the most recent data. Each time data is transferred to the global cache memory 220, the front-end director 1801–18032 controlling the transfer also informs the host computer 120 that the transfer is complete to thereby free-up the host computer 120 for other data transfers.
When it is time to transfer the data in the global cache memory 220 to the bank of disk drives 140, as determined by the back-end director 2001–20032 , the back-end director 2001–20032 transfers the data from the global cache memory 220 to the bank of disk drives 140 and resets the tag associated with data in the global cache memory 220 (i.e., un-tags the data) to indicate that the data in the global cache memory 220 has been transferred to the bank of disk drives 140. It is noted that the un-tagged data in the global cache memory 220 remains there until overwritten with new data.
Thus, it is noted that with both the systems described above in connection with FIGS. 1 and 2, a single monolithic cache memory is used and all the directors manage contention for such cache memory. It is also noted that the cache memory is uniformly distant from (or close to) all directors