1. Field of the Invention
The present invention relates to distributed data systems and, more particularly, to performing data updates in distributed data systems.
2. Description of Related Art
Distributed data systems manage data owned by members of a distributed system formed in a network. By managing the data, a distributed data system provides abstraction of data locality to data users, or clients, that are connected to member systems, or nodes, of the distributed system. Fault tolerance can be provided by clustering a distributed data system so that there are multiple nodes to perform each function of the distributed data system. A distributed data system may provide high availability by replicating or distributing data throughout such a cluster. Each node in a cluster may have one or more clients depending on the topology of the cluster and its configuration.
FIG. 1 illustrates one distributed data system configuration in which distributed data manager nodes 111 are coupled to each other and to clients 101. Each distributed data manager 111 includes a data store 121 providing data storage for distributed data within the distributed data system.
Each distributed data manager node 111 may operate on a different machine in a network. Each client 101 may communicate with an associated distributed data manager node 111 (e.g., a client 101 may communicate with a distributed data manager node 111 executing on the same machine). As needed, a distributed data manager node (e.g., 111A) may communicate with the other distributed data manager nodes (e.g., 111B-111D) in the cluster to respond to clients and to perform distributed data functions (e.g. replication, load balancing, etc.). For example, when client 101B1 generates data, client 101B1 may send the data to distributed data manager 111B. Distributed data manager 111B may then store the data in its data store 121B and transmit the data to another node 111C in the cluster to provide a back-up at a remote location. When client 101B1 accesses data, the client may request the data from distributed data manager 111B. If the requested data is present in the data store 121B, distributed data manager 111B returns the requested data to the client 101B1. Otherwise, the distributed data manager 111B requests the data from another distributed data manager in the cluster such as distributed data manager 111C. When the other distributed data manager 111C finds the requested data, that distributed data manager 111C transmits the requested data to the requesting distributed data manager 111B. The requesting distributed data manager 111B then stores the data in its data store 121B and transmits the requested data to the client 101B1.
In an out-of-process configuration, data crosses process boundaries when transmitted between different distributed data managers 111 or between distributed data managers 111 and clients 101. To transmit data across process boundaries, data is serialized before transmission, transmitted and received in its serialized format, and de-serialized at the receiving end.
In an in-process configuration, certain clients 101 share process space with certain distributed data managers 111 (e.g., a client may share process space with a distributed data manager executing on the same machine as the client). Data crosses process boundaries when transmitted between components in different processes (e.g., different distributed data managers 111) but not when transmitted between components within the same process (e.g., clients in the same process as the distributed data manager with which those clients are communicating). Therefore, although serialization and de-serialization take place across process boundaries in the in-process configuration, data may be communicated between a distributed data manager and a client sharing the same process space without the additional computation requirement for serialization/de-serialization.
Serializing data involves collecting data into transmittable data. For example, serialization may include generating object data sequentially so that the object data may be transmitted as a data stream. The serialized data represents the state of the original data in a serialized form sufficient to reconstruct the object. De-serializing involves producing or reconstructing the original object data from its serialized format. As the complexity of a data object increases, the complexity of serializing and de-serializing that data object also increases. For example, it may take longer to process data objects that have associated attributes, variables, and identification of methods and/or classes than to process more primitive data objects.
As the above examples show, serialization and de-serialization may be performed frequently in a distributed data system. The process of serializing and de-serializing data may consume more system resources than is desirable, especially as the data becomes more complex. Furthermore, the transmission of the data between nodes may cause an undesirable amount of network traffic. Accordingly, improved methods of communicating data between processes are desired.