In general, a data storage system stores and retrieves data for one or more external hosts. FIG. 1 shows a high-level block diagram of a conventional data storage system 20. The data storage system 20 includes front-end circuitry 22, a cache 24, back-end circuitry 26 and a set of disk drives 28-A, 28-B (collectively, disk drives 28).
The cache 24 operates as a buffer for data exchanged between external hosts 30 and the disk drives 28. The front-end circuitry 22 operates as an interface between the hosts 30 and the cache 24. Similarly, the back-end circuitry 26 operates as an interface between the cache 24 and the disk drives 28.
FIG. 1 further shows a conventional implementation 32 of the data storage system 20. In the implementation 32, the front-end circuitry 22 includes multiple front-end circuit boards 34. Each front-end circuit board 34 includes a pair of front-end directors 36-A, 36-B. Each front-end director 36 (e.g., the front-end director 36-A of the front-end circuit board 34-1) is interconnected between a particular host 30 (e.g., the host 30-A) and a set of M buses 38 (M being a positive integer) that lead to the cache 24 (individual memory boards), and operates as an interface between that host 30 and the cache 24. Similarly, the back-end circuitry 26 includes multiple back-end circuit boards 40. Each back-end circuit board 40 includes a pair of back-end directors 42-A, 42-B. Each back-end director 42 is interconnected between a particular disk drive 28 and the M buses 38 (a backplane interconnect) leading to the cache 24, and operates as an interface between that disk drive 28 and the cache 24.
It should be understood that the cache 24 is a buffer for host data exchanged between the hosts 30 and the disk drives 28, i.e., the cache 24 is input/output (I/O) memory. Even though the directors 36, 42 include processors that execute program instructions, the directors 36, 42 do not use the cache 24 as processor address space. Rather, each director 36, 42 includes some memory as processor address space.
Each disk drive 28 of the implementation 32 has multiple connections 44, 46 to the cache 24. For example, the disk drive 28-A has a first connection 44-A that leads to the cache 24 through the back-end director 42-A of the back-end circuit board 40-1, and a second connection 46-A that leads to the cache 24 through another back-end director of another back-end circuit board 40 (e.g., a back-end director of the back-end circuit board 40-2).
It should be understood that the redundant features of the data storage system implementation 32 (e.g., the multiple disk drive connections 44, 46 of each disk drive 28, the M buses 38, the circuit boards 34, 44 having multiple directors 36, 42, etc.) provide fault tolerance and load balancing capabilities to the implementation 32. Further details of how the implementation 32 performs data write and read transactions will now be provided.
For a host 30 to store data on the disk drives 28, the host 30 provides the data to one of the front-end directors 36, and that front-end director 36 initiates a write transaction on behalf of that host 30. In particular, the front-end director 36 provides the data to the cache 24 through one of the M buses 38. Next, one of the back-end directors 42 reads the data from the cache 24 through one of the M buses 38 and stores the data in one or more of the disk drives 28 to complete the write transaction. To expedite data transfer, the front-end director 36 can place a message for the back-end director 42 in the cache 24 when writing the data to the cache 24. The back-end director 42 can then respond as soon as it detects the message from the front-end director 36. Similar operations occur for a read transaction but in the opposite direction (i.e., data moves from the back-end director 42 to the cache 24, and then from the cache 24 to the front-end director 36).
Unfortunately, there are deficiencies to the above-described conventional implementation 32 of the data storage system 20 of FIG. 1. For example, the cache 24 is a highly shared main memory, and the set of M buses 38 is a highly shared interconnection mechanism. As such, arbitration and locking schemes are required to enable the front-end directors 36 and the back-end directors 42 to coordinate use of the cache 24 and the buses 38. These arbitration and locking schemes enable the directors 36, 42 (which equally contend for the highly shared cache 24 and buses 38) to resolve contention issues for memory boards within the cache 24 and for the buses 38. However, in doing so, some directors 36, 42 need to delay their operation (i.e., wait) until they are allocated these highly shared resources. Accordingly, contention for the cache 24 and the buses 38 by the directors 36, 42 is often a source of latency. In some high-traffic situations, the cache 24 and the buses 38 can become such a bottleneck that some external hosts 30 perceive the resulting latencies as unsatisfactory response time delays.
Additionally, since the directors 36, 42 and the cache 24 reside on separate circuit boards (see FIG. 1), there are latencies resulting from the physical distances between the directors 36, 42 and the cache 24. In particular, there are latencies incurred for the electrical signals to propagate through transmission circuitry on one circuit board (e.g., a director 36, 42), through a backplane interconnect (e.g., one of the buses 38), and through receiving circuitry on another circuit board (e.g., the cache memory 24). Typically, such latencies are on the order of microseconds, i.e., a relatively large amount of time compared to circuit board times of a few hundred nanoseconds.
Furthermore, there are scaling difficulties with the implementation 32 of FIG. 1. In particular, as more front-end and back-end circuit boards 34, 40 are added to the system 20 to increase the capacity of the data storage system implementation 32, the more congested the highly shared buses 38 become. Eventually, the addition of further circuit boards 34, 40 results in unsatisfactory delays due to over utilization of the cache 24 and the bus 38, i.e., the arbitration and locking mechanisms become unable to satisfy the access requirements of each director 36, 42.
One course of action to reducing response time of the implementation 32 of FIG. 1 is to replace the M buses 38 with a point-to-point interconnection topology, i.e., a point-to-point channel between each front-end director 36 and memory board of the cache 24, and between each back-end director 42 and memory board of the cache 24. Such a topology would alleviate any bus contention latencies since each director 36, 42 would have immediate access to a communications channel with a memory board of the cache 24. Unfortunately, there could still exist contention difficulties between the directors 36, 42 and the cache memory boards (i.e., highly shared memories), as well as additional physical difficulties in deploying such point-to-point channels between the cache memory boards and each of the contending directors 36, 42 (e.g., physical difficulties in providing memory boards with enough access ports and circuitry for coordinating the use of such access ports).
In contrast to the above-described conventional data storage system implementation 32 of FIG. 1 which is prone to latency deficiencies due to contention for highly shared resources such as a highly shared cache 24 and highly shared buses 38 leading to the cache 24, the invention is directed to data storage and retrieval techniques that utilize a cache which is preferred to a consumer (e.g., a director) of a data element stored within that cache. Since the cache is preferred to the consumer, the consumer has less contention for access to the preferred cache (e.g., less contention from other directors) vis-à-vis the cache 24 of the conventional data storage system implementation 32 of FIG. 1 which is typically equally shared among all of the directors 36, 42 of the data storage system. Preferably, the preferred cache is proximate to the consumer (e.g., on the same circuit board as the consumer) so that memory accesses are on the order of a few hundred nanoseconds, rather than several microseconds when the cache and the consumer are on different circuit boards as in the conventional data storage implementation 32 of FIG. 1.
One arrangement of the invention is directed to a data storage system having a first circuit board, a second circuit board and a connection mechanism that connects the first and second circuit boards together. The first circuit board includes (i) a front-end interface circuit (e.g., a front-end director) for connecting to an external host, (ii) an on-board cache, and (iii) an on-board switch having a first port that connects to the front-end interface circuit, a second port that connects to the on-board cache, and a third port that connects to the connection mechanism. The second circuit board has a back-end interface circuit (e.g., a back-end director) for connecting to a storage device. When the front-end interface circuit retrieves (on behalf of a host) a data element (e.g., a block of data) from the storage device through the on-board switch of the first circuit board, the connection mechanism and the back-end interface circuit of the second circuit board, the on-board cache of the first circuit board can retain a copy of the data element for quick access in the future. With the on-board cache preferred to the front-end interface circuit and both the on-board cache and the front-end interface circuit residing on the first circuit board, when the front-end interface circuit accesses the copy of the data element in the on-board cache, there will be less contention and latency compared to that for the highly shared cache 24 of the conventional data storage system implementation 32 of FIG. 1.
In one arrangement, the on-board switch is configured to selectively provide a first data pathway between the front-end interface circuit and the on-board cache, a second data pathway between the front-end interface circuit and the connection mechanism, and a third data pathway between the on-board cache and the connection mechanism. Accordingly, the on-board switch can selectively route communications between different portions of the circuit board. For example, the on-board switch can provide the second and third data pathways to convey a data element from the connection mechanism simultaneously to the front-end interface circuit and the on-board cache during a read transaction in order to direct the data element to the front-end interface circuit with minimal latency and store a copy of the data element in the on-board cache. Although there is no restriction to buffering a copy of the data element within the on-board switch during this transfer, there is no need to since the on-board switch provides the pathways to the front-end interface circuit and the on-board cache at the same time.
In one arrangement, the front-end interface circuit of the first circuit board is configured to send a request for a data element to the back-end interface circuit of the second circuit board, and the on-board cache of the first circuit board is configured to store the data element on behalf of the front-end interface circuit of the first circuit board when the back-end interface circuit of the second circuit board provides the data element to the front-end interface circuit of the first circuit board in response to the request. Accordingly, the front-end interface circuit can subsequently access the data element again without having to retrieve the data element from the back-end interface circuit a second time.
In one arrangement, the data storage system further includes a global memory circuit board that connects to the connection mechanism. In this arrangement, the global memory circuit has a memory circuit, and the front-end interface circuit of the first circuit board is configured to access a global data element from the memory circuit of the global memory circuit board through the on-board switch of the first circuit board and the connection mechanism. By placing the global data element in the global memory circuit board, the front-end interface circuit of the first circuit board, and other interface circuits, can share access to the global data element. Since the global data element is not stored in the on-board cache of the first circuit board, the other interface circuits do not increase contention for the on-board cache of the first circuit board which could otherwise cause undesirable latencies.
In one arrangement, the connection mechanism includes a main switch. This allows the data storage system to have a hub-and-spoke topology, with the main switch as the hub and the first and second circuit boards as the ends of the spokes. In this arrangement, the front-end interface circuit of the first circuit board is configured to exchange data elements with the back-end interface circuit of the second circuit board through the on-board switch of the first circuit board and the main switch of the connection mechanism.
In one arrangement, the first circuit board further includes a back-end interface circuit for connecting to another storage device. In this arrangement, the on-board switch of the first circuit board includes a fourth port that connects to the back-end interface circuit of the first circuit board. The front-end interface circuit of the first circuit board is configured to exchange data elements with the back-end interface circuit of the first circuit board through the on-board switch of the first circuit board. Accordingly, the first circuit board can essentially operate as a complete data storage system by itself since it includes a front-end interface circuit, a back-end interface circuit and on-board cache.
The features of the invention, as described above, may be employed in data storage systems, devices and methods such as those manufactured by EMC Corporation of Hopkinton, Mass.