Distributed storage is one of the foremost critical requirement for the enterprises today. The data is maintained across geographies and dedicated systems are maintained which ensure redundancy across the geographies.
One of the foremost reasons for having a distributed architecture include, but not limited to, extremely large volumes of data, to make programs scalable and take advantage of multiple systems as well as multi-core CPU architectures. On the other end, website servers need to be globally distributed for low latency and failover and so on.
In a typical scenario, System administrators can distribute collections of data (e.g., in a database) across multiple physical locations. A distributed database can reside on network servers on the Internet, on corporate intranets, or on other company networks. As distributed databases store data across multiple computers, they improve performance at end-user by allowing transactions to be processed on many machines, instead of being limited to one.
A distributed storage within the purview of the current invention may refer to a computer network where information is stored on more than one network node, e.g., in a replicated fashion. Thus, it may also refer to either a distributed database where users store information on a number of network nodes, or a computer network in which users store information on a number of peer network nodes.
A person ordinary skilled in the art may very well know that in communication networks, a node may refer to a connection point, either a redistribution point or a communication endpoint (e.g., some terminal equipment). A physical network node is an active electronic device that is attached to a network, and is capable of sending, receiving, or forwarding information (e.g., data packets) over a communications channel.
Traditionally replication involves utilizing specialized software modules that look for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same, i.e., contain similar data. The replication process can be tediously complex and time-consuming depending on the size and number of the distributed databases. This process also requires a lot of computer resources.
Thus, there is a long felt need to optimize the database replication in order to save precious network resources including memory, CPU cycles etc.
A lot of solutions have been proposed to reduce the wastage of network resources due the replication of data on multiple nodes in a network. However, existing solutions either operate on the application level with the cluster topology awareness or operate on the router level (L2/L3) without database cluster topology awareness.
Solutions based merely on “router level” methodology suffer from multiple drawbacks including but not limited to redundancy in the network due to over subscription on the master node side. Redundancy elimination techniques are themselves costly and applied uniformly without any consideration to the nature of the information, i.e., are not context specific.
Without the knowledge of the underlying database topology either additional content are cached or the caches are not used optimally.
Further, traditional solutions are not able to offer any policy for various contents based on database content. For example, such solutions cannot be used to implement, e.g., a special path which tries to maximize the MTU (Maximum Transmitting Unit) for initial mirroring information.
Thus, there is a long felt need to provide an optimized method and system for distributed storage that avoids said and many other disadvantages exhibited by the conventional approaches. Particularly, an optimized method and system for distributed storage is required that can work in any data storage solution or any database and that which works optimally irrespective of the replication mechanism being used. Such a solution should be cost effective and preferably use existing router infrastructure (like MPLS, MPLS-TE, CSPF) to achieve the desired improvement.