The present invention relates generally to storage and more particularly to a method and system for a self managing and scalable grid storage.
Storage environments have evolved from islands of capacity with minimal management using disparate tools to more centralized data centers consolidating the storage infrastructure in a few locations. This evolution has been achieved primarily due to reduced networking costs that allow more cost-effective remote connectivity of distant locations to a central data center.
Within these central data centers, storage area network SAN and network attached storage NAS infrastructures have grown in capacity and complexity creating new types of issues in scalability, interoperability and management. Islands of capacity remain, though less driven by location rather than by application. Primary storage, backup, archiving, replication, snapshot, and continuous data protection (CDP) all need to be managed and provisioned separately. Further complicating management matters, these applications rely on heterogeneous hardware that differs by capacity; type of media (disk, tape); data format (block, file); generation and firmware level; operating system; manufacturer, and more. Managing this complex infrastructure has become a management and interoperability nightmare. There are several approaches to deal with storage complexity: storage islands, storage management software and storage virtualization.
Storage islands, often grouped by application type and/or vendor, tried to “keep things simple” and minimize the need for cross training among those responsible for managing them. However, storage islands require manual management and provisioning of each hardware component through a separate user interface, and often create at least as many problems as they solve.
Storage management software is aimed at providing one centralized user interface to manage all storage components. However, storage remains complex and has to be managed manually by storage administrators based on the individual features of each product in the mix.
Storage virtualization is aimed at hiding the underlying complexity from the storage administrator by bundling hardware assigned to a specific application into one pool of storage and eliminating the need to manually migrate data from one disk array to another. However, other management tasks, such as the creation of virtual logical unit numbers LUNs, still require manual interaction. Storage virtualization does not eliminate the underlying complexity and therefore products in this space are struggling to provide true full interoperability. Virtualization also does not eliminate storage silos of different point applications for data protection and the need to manually manage them.
Peer to peer networks for storing data may be overlay networks that allow data to be distributively stored in the network (e.g., at nodes). In peer to peer networks, there are links between any two peers (e.g., nodes) that communicate with each other. That is, nodes in the peer to peer network may be considered as being connected by virtual or logical links, each of which corresponds to a path in the underlying network (e.g., a path of physical links). Such a structured peer to peer network employs a globally consistent protocol to ensure that any node can efficiently route a search to some peer that has desired data (e.g., a file, piece of data, packet, etc.). A common type of structured peer to peer network uses a distributed hash table (DHT) in which a variant of consistent hashing is used to assign ownership of each file or piece of data to a particular peer in a way analogous to a traditional hash table's assignment of each key to a particular array slot.
However, traditional DHTs do not readily support data redundancy and may compromise the integrity of data stored in systems using DHTs. To overcome these obstacles in existing peer to peer networks, files or pieces of data are N-way replicated, but the result is high storage overhead and often requires multiple hashing functions to locate copies of the data. Further, it is difficult to add support for monitoring data resiliency and automatic rebuilding of missing data.
Accordingly, there is a need for an improved grid storage system that is self managing and scalable.