Storing an object for later use by an application is called “object persistence.” In addition, encoding an object for transmission over a distributed network is called object persistence. Object persistence is also known as “serializing an object.” An “object” is the core concept of an “object-oriented paradigm.
Object-Oriented Paradigm
A large segment of the computing realm operates under the object-oriented paradigm. This is sometime called “object technology” or “object-oriented programming.” In general, an object is understood to encapsulate data and procedures (i.e., methods).
Object-oriented programming is a type of programming in which programmers define not only the data type of a data structure, but also the types of operations (i.e., procedures, functions, or methods) that can be applied to the data structure. In this way, the data structure becomes an object that includes both data and functions. In addition, programmers can create relationships between one object and another. For example, objects can inherit characteristics from other objects.
One of the principal advantages of object-oriented programming techniques over procedural programming techniques is that they enable programmers to create modules that do not need to be changed when a new type of object is added. A programmer can simply create a new object that inherits many of its features from existing objects. This makes object-oriented programs easier to modify.
To perform object-oriented programming, one needs an object-oriented programming language (OOPL). “Java,” “C++,” and “Smalltalk” are three of the more popular languages, and there are object-oriented versions of Pascal.
The object-oriented paradigm allows for the fast development of applications to solve real problems. Using this paradigm, applications can interact with other applications (or the operating system) on the same computer. Such an interaction may involve sharing data or requesting execution of a task by another application. For example, the Component Object Model (COM), by the Microsoft Corporation, enables programmers to develop objects that can be accessed by any COM-compliant application on the same computer.
The object-oriented paradigm also allows applications to interact with applications on different computers. This is often called “distributed computing.”
Generally, distributed computing utilizes different components and objects comprising an application that are located on different computers coupled to a network. So, for example, a word processing application might consist of an editor component on one computer, a spell-checker object on a second computer, and a thesaurus on a third computer. In some distributed computing systems, each of the three computers could even be running a different operating system.
One of the requirements of distributed computing is a set of standards that specify how objects communicate with one another. There are currently two chief distributed computing standards: CORBA (Common Object Request Broker Architecture) and DCOM (Distributed Component Object Model).
For example, programmers may use DCOM (by the Microsoft Corporation) to develop objects that can be accessed by any DCOM-compliant application on a different computer. DCOM is an extension of COM to support objects distributed across a network.
Object Serialization
Serialization is the process of saving and restoring objects. More precisely, serialization is the process of saving and restoring the current data and the data structures of objects. The information is extracted from objects so that it is not lost or destroyed. In other words, the transitory status of objects is fixed (often in a file or a database) for the purpose of storage or communications. This process is also called “object persistence.”
If an application using an object is closed, then the object's data and its data structures must be preserved so that the object may be restored into its current state when the program is invoked again. For example, it is often necessary to temporarily store an object so that another application may access it. In another example, sending an object to another computer in a distributed computing environment requires the object be stored, transmitted, received, and recovered. In each of these examples, objects are stored and restored.
When serializing an object, the focus is not so much on how to store an object's data in non-volatile memory (such as a hard drive), but rather on how the in-memory data structure of an object differs from how the data appears once it has been extracted from the object. In memory, the data is located at arbitrary addresses, which are conceptually defined as data structures including data, arrays, objects, methods, and the like. However, these data structures cannot be stored directly.
To store a data structure, it must be broken down into its component parts, which includes simple data types like integers, strings, floating point numbers, etc. In addition, the hierarchical arrangement within each data structure must be stored and maintained. Furthermore, the hierarchical arrangement of data structures themselves must be stored and maintained.
The serialized data of an object may be thought of as a “dehydrated object” where all of the water (object functions in this metaphor) has been squeezed out of the object. This leaves only dry potato flakes (the data). Later, a hungry person wishes to have mashed potatoes (the object with the data), the potato flakes may be rehydrated. To “add water” to a dehydrated object, an empty object is created and the stored data is inserted therein.
Serialization of an object is an effective and important step in exchanging the object between computers. These types of object exchanges are important to a distributed computing environment where computers actively distribute objects across a network. Those of ordinary skill in the art are familiar with object serialization.
Serialization Issues
Separating Data Items: When serializing an object, data items must be separated from each other when they are stored. Otherwise, they will not be properly identified later when reading the data back into a new object during deserialization. Therefore, a serialization scheme must specify how data items are separated from each other.
Preserving Hierarchical Structure: Unless the hierarchical structure of the data is preserved during the serialization process, it cannot be recreated during a deserialization. Each data structure is potentially different from each other.
Therefore, a serialization scheme must have a general data format suiting the needs of all potential data structures of an object. Typically, such a scheme accomplishes this by having the capability to delimit arbitrary nested data, that is, truly hierarchical data structures.
Preserving Object Relationships: Often objects include references to other objects. When in memory, this reference is often a pointer in memory to the other objects. When serializing an object with a reference to another object, the serialized object includes the entire object like its does for a data structure.
However, if there are multiple references to the same object, then there are redundant inclusions of the same object. Furthermore, if the reference within an object is to itself (directly or indirectly), then the serialization process may fail because it is circularly and potentially infinitely storing object data.
Extensible Markup Language (XML)
SGML (Standard Generalized Markup Language) is a generic text formatting language that is widely used for large databases and multiple media projects. It is particularly well suited for works involving intensive cross-referencing and indexing.
HTML (HyperText Markup Language) is a specific implementation of a subset of SGML and is nearly universally used throughout the global as the foundation for the World Wide Web (“Web). HTML uses tags to mark elements, such as text and graphics, in a document to indicate how Web browsers should display these elements to the user. HTML tags also indicate how the Web browsers should respond to user actions such as activation of a link by means of a key press or mouse click.
XML (extensible Markup Language) is a specific implementation of a condensed form of SGML. XML lets Web developers and designers create customized tags that offer greater flexibility in organizing and presenting information than is possible with the HTML document coding system.
In HTML, both the tag semantics and the tag set are fixed. XML specifies neither semantics nor a tag set. In fact, XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them. Since there's no predefined tag set, there are no preconceived semantics. All of the semantics of an XML document will be defined either by the applications that process them or by stylesheets.
As the Internet becomes a serious business tool, HTML's limitations are becoming more apparent. For example, HTML can be used to exchange data, but it is not capable of exchanging objects. To be more precise, HTML cannot be used to exchange serialized objects.
XML does not have defined protocol for exchanging serialized objects between computers within a distributed computing environment.