The invention relates to the cooperative operation of plural databases for extracting, updating, verifying and otherwise exploiting information that is useful to two or more member databases in a collaboration of plural databases.
Databases generally are characterized by tables of stored formatted data values in fields. Each field typically has a label. Each field relates in a planned way to other fields according to a database schema.
In some tables, at least one field value may be unique and functions as a key to which associated field values relate. For example, in a table keyed to individual persons, a unique serial number value such as the person's social security number may be a key for distinguishing a person from all other persons. A unique street address may distinguish a particular building. Other values are associated by one-to-one or one-to-many or many-to-one relationships that are useful with respect to the variables concerned. For example, a building may have plural residences. Plural persons may reside at a given residence, etc. Descriptive field values such as residents' ages or genders are associated with unique persons but the values may be common to many persons. Although these examples concern persons and residences, the same relationships are needed in databases containing various sorts of information such as financial and accounting information, production and sales data, etc.
The data values stored in a field are formatted as appropriate to the database designer's expectations for the nature of the information stored as field values and the uses to which the data is expected to be put. Formatting may include designating values as numeric or logical or as text strings. For numbers, formatting may designate values as positive or as sign and magnitude. Numbers may be integers or floating point (mantissa and exponent), and carried to a predetermined number of bits that determine maximum valid value or resolution. String variables can have a preset or variable byte length, etc. The choices made by database designers for field values and the relationships between and among the fields differ from one database to another because the database designers have different objects in mind. As a result, there are multiple standards and proprietary designs and definitions in database systems.
It is not uncommon for an established database to contain information that can also be found in another database. In the example of names of persons, their residence addresses and various descriptive values, the same persons may be found in commercial databases as customers, government databases as voters or taxpayers, employer databases, medical information databases and so on. When establishing a new database for some purpose, it would be useful if one could extract and use data that has already been collected and is contained in a preexisting database.
In a situation where two or more databases exist and contain some information that is common to both of them, it would be useful when an event occurs to change a value and the value is corrected and updated in one such database, if the correction could be propagated into other databases as well. However, databases that were established for different purposes are not likely to correspond closely. The database field relationship schema, and the formatting of the field values, and even the logical meaning ascribed to a field value, may be very different in different databases.
Considering, for example, a name, field, the fact that the field is a name field may be less important than the identifier of the name by type, as in “resident” or “taxpayer.” In the example of a commercial business, the name could designate a customer, a sales prospect, a marketing prospect, a vendor, or even an employee. These types of names are useful for different purposes. While this straightforward example of complexity illustrates how databases need to be carefully organized, systems increase in complexity in integrated systems of multiple databases with overlapping but not entirely analogous data types and/or purposes. A “fixed unit price” field, for example, as established and used in one database may be known to include direct or indirect costs as well as have promotional discounts or sales tax already included. Such a field can be transformed to include these types of financial transactions, if not originally included, or not to include them. But to roll up into other application processes, this and similar rules need to be taken into account. The field label or title “fixed price unit” is not conclusive. At the level of integrated systems “rolling up” data in this manner, the original “fixed price” charged to a Type of Name can, by purpose, have very different meaning between systems.
Techniques are known including efforts to establish common standards for industry groups to use to facilitate communication between different database systems. A thriving industry of systems integrators exists, who are trained to interpret disparate systems and to create integrations by which data contained in one system is imported into another system, or wherein values of variable fields that ostensibly equate with one another may be compared and contrasted. Systems manufacturers work to build integrated suites of solutions that work together. In the suites, some field values are used by plural members programmed components of such suites. Still needed, however, are methods, systems and programmed products that are useful to create new and unique integrated systems solutions and in particular mitigate the increasing maintenance burden of making any kind of modification to either the integrated system by adding or removing a database from it, or doing the same with data elements within the overall integrated database system.
Complexity increases as a function of n(n−1) where n can be an entire database, or a single data element.                If a database being added to an integrated system already containing five (5) databases for example, 5(5−1)=20 database connections in the existing system. To increase this five (5) system to six (6, 6(6−1)=30), a minimum of 10 (30−20=10) new connections need to be established to accommodate this sixth system, in other words. Another way of reflecting this is that to add a sixth system, 50% more connections need to be established correctly.        If a data field is added into an integrated system containing 1,000 data elements across 5 systems, the same n(n−1) logic applies. There are 5(5−1)=20 connections between databases representing 1,001(1001−1)−(1,000(1,000−1))=2,000 maximum possible impacts.        
One example of a technique to establish common standards to facilitate communication between database systems is Extended Markup Language or XML. Generally, if a set of conventions is established for naming particular data values with labels that are unique or substantially to a recognized comprehensive definition, then programs that might use the data can be made to recognize the nature of data values or segments in a data collection, by their associated labels. Theoretically, the programs can overcome variations that are encountered (such as the order in values that appear in succession) by relying on XML labels, and deal with the data appropriately. In an example, when an online procurement system is designed to deal with data exchanged with a number of vendor systems, it is useful if the different vendor systems attach the same variable labels (namely the XML labels) to data values that equate with one another and can be imported into the same variable fields in the procurement system.
Difficulties may arise if the variable fields in the systems to be integrated are different with respect to the nature and/or formatting of the data values that are contained. The nature and formatting of data is often different, sometimes in subtle ways, in systems that were not designed and maintained to interact with one another. Each proprietary data feed for integration may need to have custom code built to conform its data to the parameters of an application. Human attention is needed to investigate the extent to which the variables actually do equate, and perhaps to predict situations in which the variables supplied from proprietary sources may not comply with the needs of the application, causing a failure of some kind. The level of maintenance and effort for an integration approach like this is substantial. Beyond mapping our workflows and identifying shared data processes, a data dictionary is often created with a new set of standards to accommodate the differences across middleware, or real time management systems. Beyond a substantial effort to manually create and maintain, at the point that more than two (2) data systems connect, relatively small changes in any one connected system can have unpredictable consequences to the integrated system (as in a meta fixed price field suddenly having a field for the fixed price, then a new field for sales tax, and a new field for fixed price+sales tax equaling the former fixed price) or even break the integration all together if alterations fall outside allowed limits for naming, type of data, or amount of bit data stored. It is not just the complexity of an integrated system that increases, but also the maintenance and consequential impact of any single change anywhere in that system.
XML was thought at one time to provide a possible solution and opportunity for business industries to solve brute force database integration problems. XML labels if used consistently according to an industry standard could possibly be uniformly adopted within an industrial sphere. If all systems used matching field labels for field values with definitions that were complete and accepted, then database integration could be automated. It appears, however, that attempts to standardize database standards have had limited success. Competing commercial entities in industry groups tend to pursue a standard that suits their specific needs or preferences. What is still needed is a straightforward integration process instead of multiple competing standards in each industry or in distinct groups within industries. Moreover, most commercial entities have taken to defining their own standards for their own enterprise defying the creation of any valid “industry” standard except for those systems obligated by law to report to state and federal entities in defined formats, often with allowances for manual paper-based submissions.