Data in database management systems are typically stored in the form of records, or tuples, each composed of a fixed number of fields, also referred to as attributes. The fields of a record contain the data associated with that record. Frequently, database records are presented logically in the form of a table, with records as the rows of the table, and attributes as the columns. Systems typically store records in memory and/or on disk or other media as a linked list, with data for each record stored together.
The configuration of the data contained in a database is generally referred to as its “schema.” Typically, a database schema includes a list of the tables used by an application or suite of applications, and describes the structure of each table as well as any constraints on the data stored in that table. For streaming database platforms in which data streams are analyzed and processed in real time, the streams may be considered as “materialized views,” appearing as virtual tables, the contents of which are based on operations performed on other tables or views, and automatically change as the data in the underlying tables change. In streaming database implementations, the schema also includes the connections between the streams and tables, as well as the rules defining any dependencies among them.
From time to time, it may become necessary to update the schema due to changes in incoming data streams, new business rules, and/or updates to the applications using the data, as well as other reasons. The process of updating a database schema can be classified into one of two general categories: application-independent modifications or changes-in-place modifications.
Application-independent schema modifications are implemented by dividing the application into sub-applications that are not data-dependent, i.e., data from one sub-application is not used in, updated by, or provided to any other sub-applications, and each sub-application can be executed independently in parallel. As such, the portion of the schema for each sub-application can be considered an independent schema. Any modifications can be implemented by creating and destroying the portion(s) of the schema for the particular sub-application(s) of interest. As a schema gets destroyed (and possibly re-created in the modified form), only the affected sub-application(s) need be stopped. The rest of the application may continue, in some cases with limited functionality because the services provided by the stopped sub-application are unavailable. Typically, the destruction and re-creation of a sub-application schema involves unloading the data, destruction and re-creation of the schema with the changes, and reloading the data, possibly in a modified form to fit the new schema.
Changes-in-place schema modifications typically require stopping the entire application, unloading the data to some form of backup storage (e.g., files or temporary tables in the database or another database), deleting the schema, creating the new modified schema, reloading the data from the backup storage to the new schema (possibly including changes to the data to make it fit the new schema), and restarting the application. As a result, the application is not operating for an extended period of time while the changes are implemented.
In each case, changes to the schema or sub-schema must be atomic from the viewpoint of the application, that is, the application must see either the old schema or the new schema, but never a “change in progress” mix of the two. Because conventional database systems do not support transaction-based schema changes, this is difficult to achieve, and thus requires the application to be halted while the schema is being changed. Further, existing techniques require a “transitional program” that converts data from the formats defined in the old schema to that of the new schema. In any event, schema changes should be implemented such that the applications using the underlying data are affected as little as possible.