1. Technical Field
The present invention relates in general to a method and system for improved data processing systems, and more specifically to an improved data processing system which utilizes Java Beans or Java software applications. Still more particularly, the present invention relates to an improved data processing system utilizing Java Beans or Java software applications wherein a determination is made as to the serialization compatibility between versions of Java classes.
2. Description of the Related Art
A generalized structure for a conventional data processing system 10 as shown in FIG. 1 includes one or more processing units 12 connected to a permanent storage device (hard disk) 16, system memory device (random access memory or RAM) 18, firmware 24, and to various peripheral or input/output (I/O) devices 14. The I/O devices typically include a display monitor, a keyboard, and a graphical pointer (mouse). System memory device 18 is utilized by a processing unit 12 in carrying out program instructions, and stores those instructions as well as data values that are fed to or generated by the programs. Processing unit 12 communicates with the other components by various means, including one or more interconnects (buses) 20, or direct access channels 22. A data processing system may have many additional components such as serial and parallel ports for connection to, e.g., printers, and network adapters. Other components might further be utilized in conjunction with the foregoing; for example, a display adapter might be utilized to control a video display monitor, and a memory controller can be utilized to access the system memory, etc.
A data processing system program is accordingly a set of program instructions which are adapted to perform certain functions by acting upon, or in response to, the I/O devices. Program instructions that are carried out by the processor are, at that lowest level binary in form, i.e., a series of ones and zeros. These executable (machine-readable) program instructions are produced from higher-level instructions written in a programming language. The programming language may still be low-level such as assembly language (which is difficult to utilize since instructions appear as hexadecimal bytes), or may be a higher level language in which instructions are created utilizing more easily understood words and symbols. One example of a high level programming language is "C" (or its improved version "C++"). After a computer program is written in C++, it is converted into machine code utilizing a compiler (which reduces the high-level code into assembly language) and a linker (which transforms the assembly language into machine code).
In an attempt to simplify programming, and yet still provide powerful development tools, programmers have created "object-oriented" programming languages in which each variable, function, etc., can be considered an object of a particular class. C++ is an example of an object-oriented programming language, and provides advanced programming features such as polymorphism, encapsulation, and inheritance.
"Java".TM. is another object-oriented programming language and was developed by Sun Microsystems, Inc. Java is similar to the C++ programming language, but Java is smaller, more portable, and easier to utilize than C++ because it manages memory on its own. Java programs are compiled into bytecodes, which are similar to machine code but are not specific to any platform. Currently, the most widespread use of Java is in programming small applications (applets) for the World Wide Web of the Internet. These applets do not run as separate programs, but rather are loaded within another application that has Java support, such as a web browser. The term "applet" is particularly utilized to refer to such programs as they are embedded in-line as objects in hypertext markup language (HTML) documents.
The portability, security, and intrinsic distributed programming support features of the Java programming language make this language useful for Internet programming. Java is a totally object-oriented, platform independent programming language, which achieves architectural independence by compiling source code into its own intermediate representation. Java source code is not compiled into normal machine code, but is translated into code for a virtual machine specifically designed to support Java's features. A Java interpreter or a Java-enabled browser then executes the translated code.
Component software architectures employ discrete software components to quickly prototype and flesh out interactive applications. Applications are built by combined a set of independent components with developer-written code which acts as a "glue" between components, usually responding directly to component events by setting component properties or invoking component methods. One currently popular component software architecture is the Java Bean specification of the Java programming language.
Java Beans provide a component model for building and utilizing Java-based software components. The Java Beans application programming interface (API) makes it possible to write component software in the Java programming language. Components are self contained, reusable software units that can be visually composed into composite components, applets, applications, and servlets utilizing visual application builder tools. A "Bean" is simply a Java class with extra descriptive information, similar to the concept of an object linking and embedding (OLE)-type library. Unlike an OLE library, however, a Bean is usually self-describing, including a file which contains the class, symbol information and method signatures and which may be scanned by a development tool to gather information about the bean, a process referred to as introspection. Any Java class with public methods may be considered a bean, but a bean typically has properties and events as well as methods.
Such components can be visually composed into units utilizing visual application builder tools which are utilized only to develop other programs. Components expose their features(for example public methods and events) to builder tools for visual manipulation. A Bean's features are exposed because feature names adhere to specific design patterns. A "JavaBeans-enabled" builder tool can then examine the Bean's patterns, discern its features, and expose those features for visual manipulation.
A Java Bean component is generally made up of a non trivial number of related classes and data files. This complicates tool interface and internal workings. The capability to store and retrieve Java objects is essential to building all but the most transient applications. The key to storing and retrieving objects is representing the state of objects in a serialized form sufficient to reconstruct the object. The class is broken down into a stream of data. When the stream is read back by the same version of the class, there is no loss of information or functionality. The stream is the only source of information about the original class.
Objects to be saved in a stream may support either a serializable interface or the externalizable interface. Within a stream the first reference to any object results in the object being serialized or externalized and the assignment of a handle for that object. Subsequent references to that object are encoded as the handle. Utilizing object handles preserves sharing and circular references that occur naturally in object graphs. Subsequent references to an object utilize only the handle allowing a very compact representation.
The class of a Java object is defined as the behavior of a particular object in an object-oriented programming and a user defined type which specifies the representation of objects of the class and the operation that can be applied to said objects. A Java object's serialized form must be able to identify and verify the Java class from which the object content was saved and restore the contents to a new instance. Objects to be stored and retrieved refer frequently to other objects. Those other objects must be stored and retrieved at the same time to maintain the relationship between the objects. Thus, when an object is stored, all of the objects that are reachable from that object are stored as well.
Each object acting as a container implements an interface that allows primitive data types and objects to be stored in or retrieved from it. These are the object output and object input interfaces which are provided to a stream to write to and read from the stream. They also handle requests to write primitive types and objects to the stream. Object input streams can be extended to utilize customized information in the stream about classes or to replace objects that have been deserialized.
Object serialization has been designed to provide a rich set of features for Java classes. It produces and consumes a stream of bytes that contain one or more primitive objects. The objects written to the stream in turn refer to other objects which are also represented in the stream. Object serialization produces just one stream format that encodes and stores the contained object. Other container formats such as OME or OpenDoc have defense stream or file system representations.
Each object which is stored in the stream must explicitly allow itself to be stored and must implement the protocols needed to save and restore its state in an object stream. Object serialization defines two such protocols, serializable and externalizable. The protocols allow the container to ask the object to write and read its state.
There are two ways in which an object can be stored in storage. One is in serializable form in which all the work needed to be done is completed by Java itself. The Java Virtual Machine (JVM) decides what needs to be stored. The class itself does nothing. With a serializable interface, the object's stream includes sufficient information to restore the field in the stream for enabling a version of the class. For a serializable class, object serialization must automatically save and restore fields of each class of an object, and automatically save and restore classes that evolve by adding fields or super types. A serializable class can declare which of its fields are transients (not saved or restored) and write and read optional values and objects.
The second way to store an object is in its externalizable form in which instance of the class does all the work. The JVM does nothing. Object serialization produces a stream with information about the Java classes for the objects that are being saved. With an externalizable class, objects serialization delegates to the class complete control over its external format and how the state of the super type is saved and restored. Also, with an externalizable interface, the class is solely responsible for the external format of its contents.
For serializable objects, sufficient information is kept to restore these objects or those objects even if a different but compatible version of the classes implementation is present. The interface serializable is defined to identify classes that implements the serializable protocol. A serializable object must implement the java.io.serializable interface. Additionally, it must mark its fields that are not to be persistent with the transient keyword. It can implement a writeObject method to control what information is saved or append additional information to the stream. It can also implement a readObject method, so that it can read the information written by the corresponding write object method or update the state of the object after it has been restored. Object output streams and object input streams are designed and implemented to allow the serializable class to evolve, whereby changes are allowed to the classes that are compatible with the earlier versions of the class.
In dealing with externalizable objects, only the identity of class of the objects is saved by the container and it is the responsibility of the class to save and restore the contents. An externalizable object must implement java.io.externalizable interface. Additionally, it must implement a writeExternal method to save the state of the object. It must explicility coordinate with its super type to save its state. It must implement a readExternal method to read the data written by the writeExternal method from the stream and restore the state of the object. It must explicitly coordinate with the supertype to save its state. If writing an externally defined format, the write external and read external methods are solely responsible for that format. The write external and read external methods are public and raise the risk that a client may be able to read or write information in the object other than by utilizing its methods and fields. These methods must be utilized only when the information handled by the object is not sensitive or when exposing it would not present a security risk.
During deserialization, the private state of the object is restored. For example, a file descriptor contains a handle that provides access to an operating system resource. When Java objects utilize serialization to save state and files or as binary large objects (blobs) in databases, the potential arises that the version of a class reading the data is different than the version that wrote the data. Versioning raises some fundamental questions about the identity of a class, including what constitutes a compatible change, i.e. a change that does not affect the contract the class and its calling.
Several assumptions are made in dealing in versions of serialized objects. First, versioning will only be applied to serialized classes since it must control the stream format to achieve its goals. Externalizable classes will be responsible for their own version in which it is tied to the external format. Second, all data and object must be read from or skipped in the stream in the same order as they were written. Third, classes evolve individually as well as in concert with super types and subtypes. Fourth, classes are identified by name, two classes of the same name may be of different versions or completely different classes that can be distinguished only by comparing their interfaces or by comparing hashes of the interfaces. Fifth, default serialization will not perform any type conversions, and sixth, the stream format only needs to support a linear sequence of type changes, not arbitrary branching of a type.
When a Java object is serialized, i.e. written out to an output stream, all of the non-transient and non-static instance variables of that object are written out. During the serialization from an input stream, the default serialization mechanism of Java Virtual Machine (JVM) utilizes a symbolic model for binding the fields in the stream to the fields in the corresponding class in the virtual machine. In a development environment such as BeanExtender of International Business Machines (IBM), it is possible that the classes change sufficiently from one version of the product to another. This may result in a situation where a serialized instance from an old version of a class can not be deserialized and reconstructed into an instance of a new version of the same class. This happens since the class implementation breaks serialization compatibility from an old version to a new one.
In the evolution of classes, it is the responsibility of the evolved later version class to maintain the contract established by the non-evolved class. This takes two forms. First, the evolved class must be not break the existing assumption about the interface provided by the original version so that the evolved class can be utilized in place of the original. Secondly, when communicating with the original or previous versions the evolved class must provide sufficient and equivalent information to allow the earlier version to continue to satisfy the non-evolved contract. For the purposes of this invention, each class implements and extends the interface or contract being defined by its supertype. New versions of a class must continue to satisfy the contract for older version and may extend the interface or modify its implementation.
For any commercial software application, it is often necessary that the data files and the state information saved utilizing the old version of the product are readable by the new version. For object oriented Java bailed applications, Java provides the object serialization model for persisting the state and instant information of application classes and instances. However, the Java serialization model is such that changes made to a class between successive releases of a product may result in a situation where serialized or saved instance of an old version of a class can not be deserialized or restored and reconstructed into an instance of a new version of the same class. This happens when certain kinds of changes made to a class breaks serialization compatibility from an old version to a new version. Currently there is no automated technique that checks for the compatibility between two versions of the same class in two different releases and makes sure that the successive product releases are backward compatible. This could be a recurring problem for a product that is developed in multiple releases. The current invention implements a tool that would check the serialization compatibility between versions of a class.
In order to fix the problem of version compatibility so that it is possible to read an instance of a newer version of a class from a serialized older version, it is necessary to have in-depth knowledge of the way class and instances data is written to stream by JVM. Presently, no automated means exist for checking the compatibility. It would therefore be desirable to provide a system or method that would implement a tool which introspects two versions of a class of Java Beans and figure out whether it would be possible to reconstruct an instance of a new version of the class from a serialized older version, or alternatively, to implement the tool that would check the serialization compatibility between different versions of a class of Java Beans and possibly point out ways to fix the problems for a particular case. Additionally, it would also be desirable to provide an automated technique that checks the compatibility between two versions of the same class of different releases or different versions of Java Beans and ensure that the successive product releases are backward compatible.