Most software packages are almost constantly evolving. During the evolution of a software package, the software is revised to add new features and to increase the efficiency of old features. Often, a revision to a software package will involve a revision to the data types that are manipulated by the software package. As a software package evolves, numerous versions may be created for the same data type. For example, a first version of a software package may be designed to operate on data that is formatted according to a first version of a data type, while a second version of the same software package is designed to operate on data that is formatted according to a second version of the data type.
All of the versions of a particular data type are referred to as a "schema". A particular version of a data type is referred to as a "schema version". The process of moving from one version of a schema to another version of the schema is referred to as schema evolution. The format of a data type may be modified in a variety of ways during the schema evolution process. For example, new attributes may be added to a data type, existing attributes may be removed from a data type, and the type of data contained in particular attributes may be changed. The structure (e.g. the set of attributes and type of attributes) of a schema version is referred to as the "format" of the schema version.
Computer applications store the data they create according to certain formats, and expect the data that they access to be presented to them according to those same formats. The data formats that a computer application expects to encounter is typically determined by the versions of the schemas used at the time that the computer application is compiled. Thus, if a computer application that operates on a data type, type1, is compiled based on version 5 of type1, the computer application will expect the data it accesses to be presented according to the format of version 5 of type1.
Data created by a software package designed for an earlier schema version must be accessible to software packages designed to operate on later versions of the schema. In addition, data created by a software package designed for a newer schema version must be accessible to software packages designed to operate on earlier versions of the schema. Consequently, two problem situations may arise: (1) an application expects an older version than the version stored on disk, and (2) an application expects a newer version than the version stored on disk.
One approach to solve the problem of making the old data available to new versions of software is to perform a batch conversion on the data using a format conversion tool. During the batch conversion process, the format conversion tool reads data that is stored according to the format of the old schema version (the "old format") and stores the data according to the format of the new schema version (the "new format").
However, the batch conversion approach is not suitable for certain computing environments. For example, depending on the amount of data to be converted, the conversion process may make the data unavailable for a long period of time. Therefore, in computing environments where data must constantly be available, the batch conversion approach will not work.
In addition, batch conversion only exacerbates the problem associated with using applications that expect older versions of data. Once a batch conversion process is completed, all of the data will be stored according to the revised formats. As a result, versions of the software that use the older versions of the data types can no longer be used. To continue to use such software, the software must be recompiled based on the new versions of the data types. Thus, the batch conversion approach is not suitable for environments where some users may continue to access the data with software that expects the data to be presented according to old formats.
Schema evolution addresses both of the problem situations described above. One approach to supporting schema evolution is to maintain type definition information that specifies the latest format of all data types and to require all software to always use the latest format. During the schema conversion process, the type definition information is updated to reflect the formats of the new versions of the schemas. According to this approach, all software that will access the data must be designed to inspect the type definition information before accessing the data in order to know how to access the data. To avoid conflicts, the type definition information for any given schema cannot be modified while any process is currently accessing data associated with the given schema. Conversely, all processes will be blocked from accessing data associated with a particular schema while any data associated with the schema is being converted to a new format.
Based on the foregoing, it is clearly desirable to provide a method and apparatus for allowing schema evolution to occur without making the underlying data inaccessible during a conversion period. It is further desirable to provide a method and apparatus that allows software to access data even when the format of the data is based on a different schema version than the schema version supported and expected by the software.