The present invention relates generally to database systems. More particularly, the is a computer-implemented method that allows data in different databases, which may have different formats and structure, to be shared without requiring the data to be remodeled to fit an existing data convention.
Modern information resources, including data found on global information networks, form large databases that need to be searched to extract useful information. With the wealth of information available today, and the value that companies place on it, it has become essential to manage that information effectively using advances in database technology and database integration. However, existing database technology is often constrained by this problem of very large, disparate and multiple data sources.
As a growing number of companies establish Business-to-Business (B2B). Business-to-Consumer (B2C) and Peer-to-Peer relationships using a global communications network such as the Internet, traditional data sharing of large and multiple data sources have become even more problematic. Since data required by businesses is often stored in multiple databases or supplied by third party companies such issues are magnified as companies attempt to integrate the ever-increasing number of internal and external databases. Combining the data from separate sources is usually an expensive and time-consuming systems integration task.
Structured Query language (SQL), Open Database Connectivity (ODBC) and Extensible Markup Language (XML) tools have been developed to facilitate database integration. As beneficial as these technologies may be, they have failed to address the most difficult element of the equation in that often every database is inherently different in its structure and organization as well as its contents. In these differences lie the richness of the original structure and the value of the underlying data.
Current solutions to this problem of inherently different database structure include agreement on a common format and structure of the data being exchanged. Standards bodies and consortia have been established to standardize data structure for various applications. In order to participate in a consortium, all participants' data have to be modeled to conform to the standard data structure. However, the various consortia and standards bodies often have different standards to handle the same types of data. Even if standards are followed, the standards are generally geared toward a specific industry. In addition, standards adoption is slow because each company within each industry often still modifies the data to fit specific company requirements. Given the number of different consortia, standards and industries, the original problem still exists in that there is still no standard way to exchange data and structure between different data structures and databases both within the same industries and between industries.
Given this difficulty for a company to exchange data with a “non-conformant” entity, that is one that uses different data structure standards, the approach is to painstakingly map one field of the data to another. This process must be repeated not only for every field but also for every different type of exchange. These solutions to the exchange problem were generally custom solutions, often “hard-coded”. There remains a lack of a generic, used-configurable method for sharing data between different data structures or for transforming one hierarchical data structure to another.
For example, when attempting to store the same type of data or object, such as a customer description, database designers may use different field names, formats, and structures. Fields contained in one database may not be in another. If understood and logically integrated, these ambiguities can provide valuable information. Unfortunately, today's database technology often results in valuable information being cleansed out of the data to make it conform to a standard structure. One example of this is databases that are converted from one representation to another representation and expressed in XML with its corresponding hierarchical structure.
One of the key purposes for the development and use of XML was to solve the problems of data exchange from multiple environments and formats into a single interoperable structure. This is especially important to have seamless B2B electronic commerce (e-Commerce). The reality of XML has proven to be quite different. XML enables data to look much more alike than any previous format. However, there are still problems with using XML to represent data. These problems fall into two major categories: dirty and naturally occurring data perplex XML searching and storage, and data formats or data schemas in the original databases that offer competitive advantage or better reflect the true model of the business and its data are sacrificed to standards consortia. This means that the database formats or schemas have to be fit into the consortia data standards. This requires a highly skilled technical staff to compare one database schema to another and is time consuming. To overcome these well known XML and data exchange barriers, standards are constantly being created for schema creation and data types. However, these standards sacrifice competitive advantage for interoperability. Today, companies require both.
Neither of these problems is resolved with the introduction of data standardization and they continue to plague database integration and prevent true interoperability, especially using XML. Industry has tried to implement the same solution it used for data communication in the 1970's—industry consortia. Standards bodies like RosettaNet, BizTalk, OASIS, ACORD, and a host of others are already being formed to address the problem. Companies are told to configure their data according to a specified model so they can “talk to” any other company within the consortia. However, conforming to industry standards may raise a number of other issues. For example, if data is modeled to a specific consortium standard, it may not be able to communicate with other consortia that use a different model or standard. The handling of legacy data in multiple formats is also an issue.
A problem exists where we have two hierarchical data structures as shown in Table 1. Both of them differ in structure. A hierarchical data structure (which may be contained within a hierarchical database) usually contains root, interior and leaf nodes. Each node in the data structures may contain data or the data may only be contained in the lower level nodes such as leaf nodes. Problems arise when an attempt is made to take the data associated with one structure and apply it to another structure.
TABLE 1Structure A (with data)Structure BSuspectOffender  Name    Identification    First=“John”        Name    Middle=“Q”        Address    Last=“Public”            StreetNum  Address            StreetName    Street=“123 Main”            City    City=“AnyTown”            State    State=“TX”            ZipCode    Zip=“02334”
Unique computing science disciplines have emerged out of this overload of data and different formats of data. Database Administrators have the sole responsibility to make sure that the data that a company holds is maintained, secured, and available. Chief Information Officers are dedicated to ensure that the movement of data in and out of a company is fluid and effective. Data Modelers are responsible for arranging and presenting the data in a manner that makes sense to the problem being addressed. Within a company, the Information Technology personnel are able to establish guidelines and standards on how information should be modeled. Generally speaking, they model the data to the business in question. For example, a retail sales company may model their data in terms of “customers”, “orders”, “inventory”, “invoices” and the like. A real-estate company may model their information as “clients”, “properties” and the like. A problem arises when company “A” tries to share information with company “B” or when Dept. “A” tries to share information with Dept. “B”. The structures and hierarchy of data both within the same company and among companies is often different since the data is modeled to meet their individual needs and not modeled to simply map to a common format.
In the past several decades, computerized database management systems have been propelled into the position of being the primary means of data and information storage for small, medium, and large sized organizations. With this fundamental shift from written and printed information storage to computer-based storage, a fundamental shift in the way information is shared between groups has occurred. In the past, information from one organization to another could be shared via printed text, with interpretations of what the text means and how it is structured being embedded in related documents.
With the shift to computer based information storage, sharing data between two entities has become a much more complex problem to solve. The first attempts to solve the problem focused on the ability to simply share or intercommunicate information between two data sources. Once this problem was solved, and computers could effectively share information between two database sources, a second problem then arose.
When information can be shared between database sources, the structure of the data must be the same in order to properly exchange and share information between pluralities of data sources. At first, this seemed a simple enough of a problem to solve. The groups that want to exchange information would simply band together and agree on the specific data formats of the information that is to be shared, then all groups involved would standardize on the format, thus facilitating the interchange of same-structured information. At this point, information can be shared from many different data sources, as long as the data structures are the same between each member in the group. Over time, this prerequisite for sharing information has proven to be a technical, competitive, and financial burden for all companies involved.