A process of integrating extracted data is called physical integration (Extract/Transform/Loading (ETL)). In physical integration, data extracted from an information source by an extracting function (Extract) is subjected to physical integration (Transformation). Results of the integration are registered to a user-side by a registering function (Loading). Physical integration is applicable to collective processing executed as batch processing. In physical integration, the recency of information is ensured at the extraction of the information. As a result, overlapping management of the information sources and integrated results is apt to occur.
One method of achieving physical integration is carried out in such a way that functions to be integrated are configured into integration components as integration process logic is written preliminarily in an integration process logic description language called Transformation Description Language (TDL) and are registered with a repository, where the integration components are processed sequentially according to the TDL in executing physical integration (see, e.g., U.S. Pat. Nos. 6,014,670 and 6,339,775).
A process of collecting and integrating data present in multiple information sources managed by different systems through real-time processing in response to a request from a user-side is called virtual integration (Enterprise Information Integration (EII)). In virtual integration, necessary information is retrieved and collected from information sources in response to a request from the user-side, and the collected data is integrated and returned as integrated data to the user-side to realize virtual information integration. Virtual integration allows the user-side to acquire real-time information from an information source at the point that the user-side needs the information, thus enables use of fresh information. Used information is discarded to make overlapping information management unnecessary.
For example, a data model representing the data structure of an information source as an integration subject is defined as a physical model while a data model representing a data structure needed by the user-side is defined as a logical model, and the relation between the physical model and the logical model is defined as mapping definition to provide meta-definition, thereby realizing efficient virtual integration (see. e.g., International Patent Publication No. 2007-083371). Such virtual integration does not require integration process logic and thus, offers flexibility that enables response to changes at an information source or the user-side by changing only the meta-definition corresponding to the change. Thus, when data models at the user-side vary, virtual integration can be carried out by merely adding a logical model corresponding to an additional data model.
A process of acquiring and integrating data streaming as stream data is called stream integration. In stream integration, data streaming as part of a data stream (stream data) through a network, etc., is acquired when necessary and is subjected to information integration, and the results of the integration are sent to the data stream. Stream integration accompanies a time axis, and stream data is data sent out actively from an information source. For this reason, stream integration is carried out by a method of integration utterly different from a normal method of information integration, such as physical integration and virtual integration.
One method of stream integration is carried out in such a way that stream data acquired from an information source by a stream wrapper is stored temporarily in a cache, is integrated sequentially, and is sent out in response to a query from a user-side (see, e.g., Kitagawa, Hiroyuki and Watanabe, Yousuke, “Stream Data Management Based on Integration of a Stream Processing Engine and Database”, Proc. IFIP International Conference on Network and Parallel Computing Workshops, pp. 18-22, Dalian, China, September 2007). In conventional stream data integration, a procedure of stream data integration is written with heed paid to data from an information source and the data structure to acquire integrated information to be used.
Since the three types of information integration physical integration, virtual integration, and stream integration, are different from each other in function, each is executed by a different processing system.
The physical integration, however, requires the user (or developer) to write and register complete integration process logic, thus posing a problem of heavy workload on the user (developer). Physical integration also requires that each relevant integration process logic be rewritten to cope with a change at an information source or the user-side, thus posing a problem of low flexibility. Physical integration further requires that a complete integration process logic be written in TDL for each data model required by the user-side, thus posing a problem in that trouble in making integration process logics increases when the user-side needs various (types of) data models.
The stream integration requires a user (or developer) to write an integration process logic using relational calculation involving an information source and an expanded function to write “query description language HamQL (Query Language)”, an extended version of structured query language (SQL) for streaming processing, and to further register the written integration process logic.
Physical integration, virtual integration, and stream integration are executed by different processing systems. As a result, different development and operation is needed for each integration method, leading to a problem of greater development burden, cost, and increased complexity. When an information source and a user-side are desired to be present together in executing different integration methods, the information source and the user-side cannot be present together because different integration methods require different processing systems.
For example, when three information sources consisting of physically integrated information, data operated by a database management system (DBMS), and stream data are provided, execution of a combination of physical integration, virtual integration, and stream integration on these information sources is difficult. Likewise, when an information source is physically integrated information and a user-side is data stream, execution of a combination of physical integration and stream integration is difficult. In such a case, an individual integration system must be developed for each integration method.