The present invention relates generally to database management systems. More particularly, the invention is a computer-implemented method that allows data in different databases, which may have different formats and structures, to be shared without remodeling the data. The system and method provides for transforming one hierarchical data structure to another hierarchical data structure.
Information resources often comprise huge databases that must be searched in order to extract useful information. One example of this includes data found on global information networks. With the wealth of information available today, and its value to businesses, managing information effectively has become a priority. However, existing database technologies, including recent advances in database integration, are often constrained when interacting with multiple, voluminous data sources.
As a growing number of companies establish Business-to-Business (B2B) and Business-to-Consumer (B2C) relationships using a global communications network, such as the Internet, traditional data sharing among multiple large data sources has become increasingly problematic. Data required by businesses is often stored in multiple databases, or supplied by third party companies. Additionally, data sharing difficulties are often magnified as companies attempt to integrate internal and external databases. As a result, combining data from separate sources typically creates an expensive and time-consuming systems integration task.
A major problem in data exchange arises from attempting to apply data associated with one structure, to another data structure. Table 1 shows two differing hierarchical data structures. A hierarchical data structure usually contains root, interior and leaf nodes. Each node in the data structures may contain data, or the data may only be contained only in the lowest level nodes, referred to as leaf nodes.
In order to facilitate the exchange of data, current solutions include standards bodies and consortia that standardize data structure. Standards bodies like RosettaNet, BizTalk, OASIS, and ACORD attempt to standardize data so that it can be exchanged more easily. However, there are problems presented by these solutions. To participate in a consortium, all participants"" data has to be modeled in the same manner. Additionally, consortia and standards bodies established to handle similar types of data often have different standards for specific industries. The adoption of standards is also slow, because businesses within each industry still modify data to fit their own company requirements. Hence, given the number of different consortia, standards, and industries, there is still a need for a standard means to exchange data and data structure between different data structures and databases, among companies of the same and different industries, and even among departments of the same companies.
One current approach to filling this need is to painstakingly map one field of data to another, in order to exchange the data with a xe2x80x9cnon-conformantxe2x80x9d entity; that is, one that uses different data structure standards. This process must be repeated not only for every field but also for every different exchange. These solutions to the exchange problem are generally custom xe2x80x9chard-codedxe2x80x9d solutions. An efficient, user-configurable method for sharing data between different data structures, by transforming one hierarchical data structure to another, is still lacking.
Technologies such as Structured Query language (SQL), Open Database Connectivity (ODBC) and Extensible Markup Language (XML) have been developed to facilitate data integration. As beneficial as these technologies may be, however, they have failed to address inherent differences in the structure and organization of databases, in addition to the contents. These differences are important, because the richness of the original structure often contributes to the value of its underlying data.
For example, when attempting to store the same type of data or object, such as a customer description, database designers may use different field names, formats, and structures. Fields contained in one database may not be used in another. Or data that is stored in a single field in one database may be stored in several fields in another. If understood and logically integrated, these disparities can provide valuable information, such as how a company gains competitive advantage based on its data structuring. Unfortunately, today""s database technologies often cleanse the disparities out of data to make it conform to standards of form and structure. Examples include databases that are converted from one representation to another representation and expressed in XML, using its corresponding hierarchical structure.
Integrating data from multiple environments and formats into a single interoperable structure is particularly necessary to seamless B2B electronic commerce (e-Commerce), and XML enables data to look much more alike than any previous format. However, there are still problems with using XML to represent data. These problems fall into two major categories: 1.) dirty and naturally occurring data perplex XML searching and storage and 2.) data formats or data schemas in the original databases that offer competitive advantage or better reflect the true model of the business and its data, are sacrificed to standards consortia. This means that the database formats or schemas have to be fit into the consortia data standards, which requires a highly skilled technical staff to spend a large amount of time comparing one database schema to another. Moreover, the standards being used and developed to overcome these data exchange barriers sacrifice competitive advantage for interoperability. Today, businesses require both.
Conforming to industry standards may also raise another of other issues, such as intellectual property issues; the ability for data modeled to a specific consortium standard to communicate with other consortia that use a different model or standard; and the handling of legacy data in multiple formats.
The present invention solves the aforementioned needs, by providing a system and method for data sharing, without requiring that the data be remodeled to fit a common format or convention. Data can be dynamically transformed from any hierarchical structure to any other, regardless of format.
The present invention is a method for sharing data between hierarchical databases, comprising defining, configuring and storing datatypes, defining, configuring and storing hierarchical data structures comprising the datatypes, establishing and storing a lineage for linking related datatypes into families, defining, configuring and storing measures of similarity and similarity match tolerances, defining, configuring and storing match strategies, transforming a source hierarchical data structure to a target hierarchical data structure by determining the similarity between the source and target data structure, and evaluating an effectiveness indicia of match strategies. The method may further comprise manually defining, configuring and storing mappings between datatype elements.
The present invention also provides a user-configurable xe2x80x9ctree transformationxe2x80x9d system and method that employs a step-by-step process of elimination to take the contents of one hierarchical data structure and apply them to a different structure. It allows for the use of a xe2x80x9cdictionaryxe2x80x9d of common datatypes, which establishes a relationship hierarchy between datatypes so that datatype lineage may be used to facilitate the tree transformation process. The present invention has a user-definable xe2x80x9cstring similarityxe2x80x9d comparator to establish the similarity of two strings, which may be used to facilitate the tree transformation process. It has a user-definable xe2x80x9cstructure similarityxe2x80x9d comparator to establish the similarity of tree structures, which may be used to facilitate the tree transformation process. The present invention also has user-definable element pairing maps, which may be used to facilitate the tree transformation process.
The invention provides a computer-implemented method for applying data from a first hierarchical data structure to a second hierarchical data structure, comprising receiving a source element containing data from the first hierarchical data structure and a target element from the second hierarchical data structure, which is to contain the transformed data. It is determined whether the source element and target element have any child elements. Where the source element has no child elements and the target element has no child elements, the data from the source element is copied to the target element. Where the source element has no child elements and the target element has at least one child element, the data contained by the source element is separated and applied to the at least one target child element. This may be accomplished via a best-fit algorithm, and the source element data may be separated into tokens that are applied to the target child elements.
Where the source element has at least one child element and the target element has no child elements, the data on the at least one child element of the source element is combined into one value and the value is applied to the target element. Where the source element has at least one child element and the target element has at least one child element, it must be determine whether a source child element matches an unfilled target child element. This determination may comprise setting a source child pointer to a first source child element and determining if the first source child element and an unmarked target child element satisfy a first match strategy. Where the first match strategy is satisfied, the target child element is marked and the overall invented method reiterated by receiving the first source child element as the source element and the marked target child element is received as the target element. Where the first strategy is not satisfied, it is determined whether at least one additional source child element exists. Where at least one additional source child element exists, the source child pointer is set to a next source child element and the step of determining whether each child element of the source element matches an unfilled child element of a target element is reiterated.
Where no additional source child elements exist, it is determined whether at least one additional strategy exists. Where at least one additional strategy exists, the step of determining whether each child element of the source element matches an unfilled child element of target element is reiterated, using a next strategy. Where no additional strategies exist, a message is returned, indicating that no match is available between the first source child element and the at least one child of the target element.
Where such a message is returned, the user may explicitly define at least one element match between at least one source element and at least one target element, via a user-definable mapping services facility.
Where a source child element matches an unfilled target child element, the data of the source child element is applied to the unfilled target child element. The steps of the method are reiterated, until all elements of the second hierarchical data structure have been traversed.
Strategies may be used in order of decreasing accuracy and may be stored in and retrieved from a Similarity Score Services facility. A user may define the accuracy of a match strategy. A match strategy comprises at least one comparison utility, each comparison utility chosen from a group consisting of a context comparison utility, an element comparison utility, an attribute comparison utility, a lineage datatype comparison utility, and a tree datatype comparison utility.
The current invention is also directed to a software program embodied on a computer-readable medium, incorporating the invented method.
The current invention is also directed to a computer-based system for applying data from a first hierarchical data structure to a second hierarchical data structure. The system comprises a means for receiving at least one source element from the first hierarchical data structure and at least one target element from the second hierarchical data structure, a means for determining whether source elements and target elements have child elements, a means for copying data from a source element to a target element, a means for separating data from a source element and applying the data to at least one child of a target element, a means for comparing a child of a source element to a child of a target element and determining a match, and a means for copying data from a source child element to a target child element, where a match is determined.
The system may further comprise a means for receiving datatypes from a user and for allowing the user to configure and define the datatypes. The system may further comprise a means for receiving explicit mappings that match at least one source element to at least one target element from a user for allowing the user to configure and define the mappings. The system may further comprise a means for storing at least one match strategy for allowing the user to configure and define the at least one match strategy.