1. Field of the Invention
The present invention relates to data storage systems, and in particular, to a method and apparatus for utilizing a number of cache storage nodes in a cluster storage subsystem.
2. Description of the Related Art
The ability to manage massive amounts of information in large scale databases has become of increasing importance in recent years. Increasingly, data analysts are faced with ever larger data sets, some of which measure in gigabytes or even terabytes. To access the large amount of data, two or more systems that work together may be clustered. Clustering generally refers to multiple computer systems or nodes (that comprise a central processing unit (CPU), memory, and adapter) that are linked together in order to handle variable workloads or to provide continued operation in the event one computer system or node fails. Each node in a cluster may be a multiprocessor system itself. For example, a cluster of four nodes, each with four CPUs, would provide a total of 16 CPUs processing simultaneously. Practical applications of clustering include unsupervised classification and taxonomy generation, nearest neighbor searching, scientific discovery, vector quantization, time series analysis, multidimensional visualization, and text analysis and navigation.
In a clustered environment, the data may be distributed across multiple nodes that communicate with each other. Clustering in such a storage system provides a way to bundle throughput from multiple nodes to serve a single or multiple clients. Each node maintains a data storage device, processor, etc. to manage and access a portion of the data. However, such a distributed system requires a mechanism for managing the data across the system and communicating between the nodes.
In order to increase data delivery and access for the nodes, cache may be utilized. Cache provides a mechanism to store frequently used data in a location that is more quickly accessed. Cache speeds up data transfer and may be either temporary or permanent. Memory and disk caches are utilized in most computers to speed up instruction execution and data retrieval, and to provide low read/write latency and potentially better throughput than read/write to disk. These temporary caches serve as staging areas, and their contents can be changed in seconds or milliseconds. Cache in storage nodes usually improves read performance by predictively reading ahead and improves write performance by DASD fastwrite. The host initiated write operation is completed as long as the data is in a storage node's cache. The write data is later grouped together and flushed to disk as a delayed operation.
In the prior art, a mainframe or centralized storage model provides for a single global cache for a storage cluster. Such a model provides a single pipeline into a disk drive. Having data in one central location is easier to manage. However, to share data stored in a centralized location, multiple copies of the data must be made. Multiple copies of write data in the cluster are made to guarantee write data is not lost after one hardware failure, such as node failure. Also, multiple copies of unmodified data may reside in different nodes to provide good access locality.
In another prior art model, the disk is separated from its controller and a storage area network (SAN) is utilized to store the global cache. In a SAN, a back-end network connects multiple storage devices via peripheral channels such as SCSI (small computer system interface), SSA (serial storage architecture), ESCON (enterprise systems connection), Fibre Channel, Infiniband, and iSCSI (SCSI over IP). A centralized SAN ties multiple nodes into a single storage system that may be a RAID (redundant array of independent devices) device with large amounts of cache and redundant power supplies. A centralized storage topology, wherein data is stored in one central location, is commonly employed to tie a server cluster together for failover. In addition, some storage systems can copy data for testing, routine backup, and transfer between databases without burdening the hosts they serve.
In a decentralized SAN, multiple hosts are connected to multiple storage systems to create a distributed system.
In both decentralized and centralized SAN systems, nodes can be added, and data can be scaled and managed better because the data does not have to be replicated.
Typically, in the prior art, there are two nodes in SAN storage products. Such storage products are referred to as “active-passive”—one node in the storage product is active and one is passive. When utilizing a passive node, there is no input/output (I/O) operations between the nodes unless requested (i.e., the node is passive). Such a request is primarily invoked when there is an error on the node the user is currently communicating with and recovery is required. Further, I/O can only occur in one direction—up/down the active channel. Such one way communication results in the inability to share information. Thus, with an active-passive storage product, the lack of active bi-directional communication between the nodes slows performance.
Storage subsystems, such as a storage cluster, are widely used to serve “shared” data in an enterprise computing environment that has high performance, fault tolerance, and storage capacity requirements. As described above, in a prior art clustered environment, one or more nodes are used to access data. However, the lack of active communication between nodes in prior art systems limit potential performance of the system. Accordingly, what is needed is a storage system and method for moving data closer to a most frequently accessed communication point to increase probable data delivery performance and to provide acceptable performance, fault tolerance, and storage capacity.