1. Field
Embodiments of the invention relate to the field of data processing; and more specifically to exchange of data in a distributed system at a subset data structure granularity without per field overhead.
2. Background
Components in a distributed system commonly exchange data. Typically, the components exchange data in the form of data structures in their binary representation. Some data exchange mechanisms operate purely at a data structure granularity (i.e., the sending component sends each field of the data structure to the receiving component regardless whether each field actually has data), which can lead to waste if the sender does not populate each field of the data structure. For example, even though a sender may only populate a subset of fields of a data structure to send to a receiver, the entire data structure will be encoded, transmitted, and required to be fully decoded by the receiver. In distributed systems where state updates are exchanged frequently, the amount of unpopulated fields and waste can be large. One approach to deal with this waste is to design the data structures such that each subset of fields are themselves a separate data structure. However, since this will increase the number of data structures on the system and will require a new data structure to be defined for every new combination of fields to be sent, this increases the application complexity, has a high engineering cost, and is difficult to implement on existing systems.
Other data exchange mechanisms allow for the sender to declare fields of a data structure as optional and encode and transmit only those parts of the data structure. This allows for a sub-structure communication between components. However, these data exchange mechanisms require a per field overhead (e.g., 1 byte of overhead for each field sent, which typically takes the form of a field identifier) since the sender can send an arbitrary combination of the structure's fields and the receiver needs to be able to decode those fields.
As a prerequisite for both data exchange mechanisms described above, a high level description of the data structures to be exchanged, which defines how to handle the data to be sent or received, must be known to both the sender and the receiver. For example, for each data structure to be exchanged, this high level description includes the types of each field in the data structure, the length of each field in the data structure, the offset of the fields, etc. Data handling code is automatically generated based on this high level description and is used to encode the data to be sent at the sender and decode the data at the receiver. Since a typical distributed system exchanges data for many different data structures, these systems must manage a large number of data structure metadata and large amounts of automatically generated handling code.