The lifecycle of electronic data may span multiple computer programs. For example, one program may produce data for another program to consume. Computer environments offer primitive mechanisms to accomplish data sharing between programs. These primitives include files, sockets, queues, pipes, shared memory, and messages. However, these primitives treat data as opaque. As such, they lack support for determining the extent of data to transmit and determining the structure of data received.
Existing frameworks more or less assist with exchanging structured data. Such frameworks include dynamic data exchange (DDE), common object request broker architecture (CORBA), extensible markup language (XML), JavaScript Object Notation (JSON), and Java object serialization. However, because these frameworks do not offer a robust integration of received data into existing data, an entire lifecycle of data is unsupported. For example, a receiver may need a mechanism to merge received structures with its own structures. Likewise the receiver may revise the received data and return that data back to an original sender, in which case the original sender faces the same problem of an unsupported merge.
This unmet need is aggravated if exchanged data has some pointers that point to elsewhere within the exchanged data and other pointers that point to outside of the exchanged data. Especially acute is a problem of an exchange that incorporates data drawn from different memory resources, such as a call stack, a heap, and a static region. Upon receipt, the exchanged data may also need integrating with call stack and a heap, for example.
These problems arise in a variety of programming languages, such as C, Java, and Python. Another example is the statistical programming language, R, which may process an immense dataset. R provides a rich statistical environment that includes and supports various canned computations for fields such as statistics, mathematics, and physics. However, R does not support sharing data objects amongst processes. R has a single threaded architecture that is built upon many global structures that are referenced throughout R local data structures, which makes such structures non-portable. This means that an R data structure, such as a data frame or a vector, cannot be simply copied into a memory of another R process.
Consequentially, R computation is almost impossible to parallelize for horizontal scaling as needed to timely process multi-gigabyte datasets. Besides R, other languages also either do not support multitasking or have non-portable data structures due to usage of global variables and embedded pointers. This challenging programming problem exists for many large systems that include legacy components that need to exchange vast amounts of data between them.