1. Field of the Invention
The present invention relates to collection and integration of data in plural information sources managed in different systems.
2. Description of the Related Art
An apparatus has been realized conventionally that integrates data managed in different systems to coordinate the systems. For example, Extract/Transform/Load (ETL) is implemented by extracting data from a database serving as an information source, transforming the data into a form easily utilized in a utilization-side system, and loading the data into a database of the utilization system, and is normally developed and operated in batch processing according to each purpose. A typical application of ETL is for the establishment of a data warehouse.
In Enterprise Application Integration (EAI), organic coordination of plural computer systems is implemented by coordinating data and processes in accordance with predetermined criteria among coordinated systems.
In a specific example of EAI, a predetermined standard data format is prescribed to implement coordination between plural business systems designed to use different data formats, and when data coordination is performed between the business systems, data of the transfer-source business system are temporarily converted into the standard data format and further converted into the data format of the transfer-destination business system to thereby implement the data coordination between the systems (see, for example, Japanese Patent Application Laid-Open Publication No. 2005-293047).
In the disclosed technology, the format of data from one business system is converted into the standard data format, or vice versa, to perform data coordination with the use of dictionary databases having stored therein correlation information between data formats used in data processing by business systems and the standard data format. In this method, the standard data format must be defined to establish a dedicated conversion dictionary database for each information system, and if a change is made in the standard data format, all the dictionary databases must be changed. At the time of actual coordination, the data format conversion process is performed and CPU processing is executed in at least two steps.
Therefore, data integration called Enterprise Information Integration (EII) is desired. EII is a scheme of integrating and utilizing physically scattered data on a single view.
On the other hand, Master Data Management (MDM) is a scheme of integrating and managing master data distributed among plural systems. FIGS. 22 and 23 depict the principle of MDM according to a conventional technology. FIG. 22 is a schematic of states of subsystems before introducing the MDM, and FIG. 23 is a schematic of an exemplary implementation of the MDM according to a conventional technology.
In FIG. 22, reference numerals 2201, 2202, and 2203 denote subsystems A, B, and C, respectively, to be integrated. The subsystem A 2201 includes a database A (DB-A) 2210 having a table A 2211 and a table B 2212; the subsystem B 2202 includes a database B (DB-B) 2220 having a table C 2221; and the subsystem C 2203 includes a database C (DB-C) 2230 having a table D 2231. Each table has columns. For example, the table A 2211 has columns A1, A2, A3, A4, and A5.
Reference numerals 2241, 2242, 2243, and 2244 denote data items that are the targets of integration among the data in the tables of the DBs managed in the subsystems. For example, in the case of the table A 2211, columns “A1”, “A3”, and “A4” are the targets of integration. The subsystem A 2201 includes a function-X, which is a representative example of a function included in a data integration target system before the data integration is applied, and the subsystem B 2202 includes a function-Y, which is a representative example of an application function utilizing the data integration.
FIG. 23 depicts an exemplary implementation of the MDM with the subsystem A 2201 and the subsystem B 2202 of FIG. 22 integrated according to a conventional technology. First, a master (integration) DB 2250 is created and, configured to collect the data sequences 2241, 2242, 2243 from tables (original tables) of the DB managed in the subsystems and to include a master table M 2251. Although the data sequences collected into the master table M 2251 are deleted from the original tables to avoid overlapping management whenever possible since the data sequences collected into the master table M 2251 are the master data in this case, the integration target data may not completely be deleted from the original tables.
For example, since the data sequences serving as a primary key of the original table cannot be deleted, some data may be managed by both the original table and the master table M 2251. Applications implementing functions of the subsystems are changed to handle not only the original tables but also the master table M 2251. Each table is shown, for example, the table A 2211 includes “A1”, “A2”, and “A5” as columns.
Specifically, the shared information (“A1”, “A3”, “A4”, “B2”, “B3”, “B4”, “C2”, and “C3”) is centrally managed as the master table M 2251 by the master DB 2250. The information specific to the systems (“A2”, “A5”, “B1”, “B5”, “C1”, “C4”, and “C5”) is managed by the systems. The DBs of the systems also include information overlapping with the master DB 2250 (e.g., “A1”, “B4”).
The operations of the function-X and the function-Y are explained. First, the function-X of the subsystem A 2201 is executed, which is “update columns A1 and A2 of table A, A3 and A4 of table M (1)”. Therefore, the subsystem A 2201 updates “A1”, “A3”, and “A4” of the master table M 2251 of the master DB 2250 and updates “A1” and “A2” of the table A 2211 managed by the DB-A 2210.
Next, “write sum of A2 and A3 into B4 of table B (2)” of function-X is executed. Therefore, the subsystem A 2201 acquires “A3” in the master table M 2251 of the master DB 2250 and writes the sum of “A2” in the table A 2211 and the acquired “A3” into “B4” of the table B 2212. The update of “B4” is reflected in the master DB 2250 (3).
The function-Y of the subsystem B 2202 is then executed, which is “refer to B4 of table M and update C4 (4)”. Therefore, the subsystem B 2202 refers to “B4” of the master DB 2250 reflecting the update at (3) above and updates “C4” of table C 2221. Data integration using the master DB 2250 is performed as described above.
However, the conventional system has the following problem in that since the master DB 2250 is necessary, the application of the utilization-side system must consciously manage the location of information as well as reference and update the information not only in the table managed by its system but also in the master DB 2250. Therefore, this leads to the need to upgrade the application causing the contents of the application to become complicated.
When updating information, the application must perform control to synchronize and update the information in its own system and the information in the master DB 2250 without inconsistency. In this control, it is problematic in that transactions must be controlled for the system and the master DB 2250 to implement a process such as rollback by the application when a process has failed, leading to increased burden on the application.
The data of the utilization-side system referred to at the time of update or the data to be updated may be in a partially processed state (state when values are not determined because another application is operating) in some cases, and this must be prevented by some kind of lock control for data subject to update. This is also problematic in that the overall performance of the system is reduced by performing the lock control across the subsystems. For example, when the subsystem A 2201 locks the master table M 2251 (integration DB), another system cannot utilize the integration DB and, therefore, the other system must wait for the completion of the process (transaction) of the subsystem A 2201.
A change in the subsystem may cause an addition to or a change in data item managed by the integration DB, and on this occasion, in many cases, the application must be changed for all the subsystems using the table of the integration DB where the change occurs. Since an addition to the data item managed by the integration DB is generated according to the individual subsystems, the integration DB tends to be bloated as a result.
Since centralization to the integration DB and the bloating of the integration DB occur and the access to the integration DB is increased, there is a problem in that the performance deteriorates in terms of referencing/updating the integration DB, the applications become complicated, and the integration DB affects each of the subsystems.