Serialization of data structures (also called linearization or marshalling; it should be noted that all of these terms also have other meanings) means converting a more or less arbitrary data structure to a string of bytes (or words), such that the bytes can be, for example, written to a file, stored in a database, sent over a network to another computer, migrated, or shared in a distributed object system. The bytes contain an encoding of the data structure such that it can later be read in (possibly in a different computer or a different program) and the original data structure restored.
Serialization is readily available in some programming languages or run-time libraries, including Java and C#. Many serialization implementations only support non-cyclic data structures; however, some support arbitrary cyclic or shared data structures and preserve any sharing.
Garbage collectors routinely handle cyclic data structures. They typically use mark bits, forwarding pointers, or any of a number of known mechanisms for detecting and dealing with cyclic data structures. A good reference for garbage collection methods is the book R. Jones & R. Lins: Garbage Collection Algorithms for Automatic Dynamic Memory Management, Wiley, 1996.
Multiobject garbage collection is described in U.S. patent application Ser. No. 12/147,419 by the same inventor, which is incorporated herein by reference.
Several garbage collection mechanisms try to cluster objects such that objects referencing each other are in adjacent memory locations or at least on the same virtual memory page. This is sometimes called linearization. However, the operation and objectives are different from serialization, as the clustering performed by garbage collection does not generally yield a data stream that could be written to external storage and loaded back later; there the goal is simply to speed up program execution by improving the cache hit rate.
The term serialization is frequently used to refer to synchronization of operations in concurrent programs, which meaning is completely different from the meaning used herein.
Distributed garbage collection is frequently used in distributed object systems as well as in many environments utilizing remote method invocation mechanisms. For example, Microsoft's .NET architecture includes distributed garbage collection. A survey of the field can be found in S. Abdullahi et al: Collection schemes for distributed garbage, IWMM'92, Springer, 1992, pp. 43-81.
Several authors have pointed out serialization as a significant performance bottleneck in many applications, for example in Java remote method invocation, where it has been found to dominate remote method invocation costs in some studies. Remote method invocation is a very important tool in building large distributed computing systems, and faster serialization mechanism would thus help make distributed systems more efficient.
There are also applications, such as large knowledge-based systems, where the data structures to be serialized are extremely large, and may grow to billions of objects in the near future. Such data structures also tend to be cyclic and have extensive sharing. Very fast and memory efficient serialization methods will be needed for serializing such data structures.