This invention relates to the field of database technology. In particular, the invention relates to refactoring of databases to include soft type information.
In many instances, database modelers use a “soft type” to represent attributes of an entity, or even entities. Soft type refers to the convention of storing an attribute as a tuple consisting of <key, tag, value>.
For example, a database table of Employees might look like this:
TABLE 1Depart-PhoneIdNamementNumberEmail1Judith Hall76H512-555-1234jhall@example.com2Chuck Riegle76H512-555-6780criegle@example.com3Art Vandelay98G408-555-0987art@example.com4Biff Loman12P313-555-0654bloman@example.com
To add some new attributes, rather than extending the current table, or creating a new table, a soft type table could be used. For example, if a user wanted to add attributes for employee type and salary, a single table could be used like this:
TABLE 2IdTagValue1Employee TypeFull Time1Salary60,0002Salary19,0002Employee TypePart Time
There are many reasons why a soft type table might be used instead of extending the Employees table.
The database could be deployed in production with a much larger population of employees, with many applications using the database with the above schema. Changing the schema could impact all the applications.
At the time the database was designed, not all the attributes may have been known, or the range of values may not have been known. So the designer may have elected to store additional attributes in a soft-type table.
In addition to the <key, tag, value> approach to modeling soft types, a number of applications leverage the relational database capability to process Extensible Markup Language (XML) data for a similar approach. For example, in some enterprise relational database products a hybrid query language able to process pure Structured Query Language (SQL), pure XQuery, and hybrid queries (SQL with embedded XQuery snippets or XQuery with embedded SQL snippets) has been introduced alongside a native XML data type. This XML capability has been exploited for example by systems leveraging the relational database, for example, a Master Data Management system utilizes this capability to manage a “soft schema” which is essentially an XML column in the database. These products can then take advantage of a huge degree of flexibility by storing a piece of XML alongside the schema for the XML document. While this allows a flexible introduction of new attributes without changing the data model and the services access layer, this has severe negative impact regarding search capabilities and query performance.
For example, when querying amongst tables having XML columns, the semantics of the search is counter-intuitive for queries containing more than one condition that must be extant simultaneously. An entity (such as a person or a product object when considering an application that takes advantage of this capability) may have two separate XML documents describing their characteristics. An Xquery that searches across two or more XML documents simultaneously may not yield the results that were intended. Properly created queries such as these are complex and extremely expensive in SQL databases.
Soft type tables also have issues with: data integrity (soft types have to be able to store any value) and performance: As noted with XML queries, joining soft type data is an expensive operation, and that also gets worse over time as the table grows.
As the number of soft type elements grows, and these performance and security issue grow accordingly, developers typically will attempt to refactor the tables to move the soft types into a first class object (e.g. structured columns in tables). However, for production applications, there is no technology accelerator to refactor the tables, and more importantly, to automatically migrate the existing soft type data to the new structured column. Furthermore, any applications that depend on the existing soft type structure (such as those that provide data as a service) will have to be refactored as well.