Business entities and consumers are storing an ever increasing amount of digitized data. For example, many commercial entities are in the process of digitizing their business records and/or other data. Similarly, web based service providers generally engage in transactions that are primarily digital in nature. Thus, techniques and mechanisms that facilitate efficient and cost effective storage of vast amounts of digital data are being implemented.
When linking remote (or even locally dispersed) locations that require access to stored data, and/or to promote the continued availability of such data in the event of hardware, software, or even site failures (e.g., power outages, sabotage, natural disasters), entities have developed clustered networks that link disparate storage mediums to a plurality of clients, for example. Typically, to access data, one or more clients can connect to respective nodes of a clustered storage environment, where the nodes are linked by a cluster fabric that provides communication between the disparate nodes. Nodes can be dispersed locally, such as in a same geographical location, and/or dispersed over great distances, such as around the country.
As data storage requirements and/or management needs of a data management and storage networks increase, such as for an enterprise, for example, nodes can be added to the system. Further, enterprises that have branch/remote office often utilize a centralize storage system, remote from the branch offices. However, as more remote nodes are added to the system to accommodate storage requirements, for example, data access can be slowed.
Accessing data from a remote node takes longer (hence utilizing greater computing resources) than accessing data from the local node. For example, a client computer attempting to access data from a locally connected node will receive a response faster than if requesting the same data from a remote node. Accessing data on a remote node can be slower because it may require a network hop to the node, there may be latency and bandwidth constraints for the interconnect cluster network, etc., any and/or all of which can adversely impact data access.
Currently, techniques and/or systems utilize data reduction to accelerate transfer of data across a wide area network (WAN). For example, using data reduction, all data to and from a locally connected client is examined in real-time (e.g., as data requests and responses occur), prior to being sent across the WAN. All examined data is stored in a data storage device at the local node. Further, when duplicate information is detected, references are sent to the requesting node in the cluster instructing the node to deliver the information locally, for example, instead of resending it across the WAN.
Using the current technology, all of the data that is sent or received from or to a node in the cluster is examined and compared against the information stored at the local node. However, constantly comparing data against locally stored data can also lead to slowed performance and may necessitate greater computing resources.