1. Field of the Invention
The present invention relates to data storage systems, and in particular, to a method and apparatus for utilizing cache in a number of storage nodes in a cluster storage subsystem.
2. Description of the Related Art
The ability to manage massive amounts of information in large scale databases has become of increasing importance in recent years. Increasingly, data analysts are faced with ever larger data sets, some of which measure in gigabytes or even terabytes. To access the large amount of data, two or more systems that work together may be clustered. Clustering provides a way to improve throughput performance through proper load balancing techniques. Clustering generally refers to multiple computer systems or nodes (that comprise a central processing unit (CPU), memory, and adapter) that are linked together in order to handle variable workloads or to provide continued operation in the event one computer system or node fails. Each node in a cluster may be a multiprocessor system itself. For example, a cluster of four nodes, each with four CPUs, would provide a total of 16 CPUs processing simultaneously. Practical applications of clustering include unsupervised classification and taxonomy generation, nearest neighbor searching, scientific discovery, vector quantization, time series analysis, multidimensional visualization, and text analysis and navigation. Further, many practical applications are write-intensive with a high amount of transaction processing. Such applications include fraud determination in credit card processing or investment house account updating.
In a clustered environment, the data may be distributed across multiple nodes that communicate with each other. Each node maintains a data storage device, processor, etc. to manage and access a portion of the data that may or may not be shared. When a device is shared, all the nodes can access the shared device. However, such a distributed system requires a mechanism for managing the data across the system and communicating between the nodes.
In order to increase data delivery and access for the nodes, cache may be utilized. Cache provides a mechanism to store frequently used data in a location that is more quickly accessed. Cache speeds up data transfer and may be either temporary or permanent. Memory and disk caches are utilized in most computers to speed up instruction execution and data retrieval. These temporary caches serve as staging areas, and their contents can be changed in seconds or milliseconds.
In the prior art, caching and prefetching strategies are often complicated, confusing, based on scientific workloads for cache management, and designed to guard against file cache corruption due to application faults and power failures with unreliable file systems. Accordingly, what is needed is a storage and caching system that is efficient, does not require special hardware support, and provides sufficient reliability.
To address the requirements described above, the present invention discloses a method, apparatus, article of manufacture, and a memory structure that provides a mirrored-cache write scheme in a cluster-based file system. When a user application or host issues a write request from a node, the data is written to the cache of both the receiving node (referred to as node i) and a partner of the receiving node (referred to as node i+1). In one or more embodiments of the invention, node i""s partner is always node i+1, except for the last node, whose partner is node 0 instead.
A global cache directory manager (that may or may not be used depending on the implementation) is embedded in a file system and checks to determine if the data being written is currently owned by another node (referred to as a remote node). If so, the cache directory manager invalidates the copy in the remote node based on an invalidation protocol. Once invalidation is complete, node i writes the data to its own local file cache. Node i may also write the data to the node i+1 and to disk as a nonblocking write (asynchronous write). Once node i receives confirmation of the completed cache write from node i+1, the user/host write can return.