Over recent years, configuration information of information systems has been actively described and managed as structured documents. Specifically, to deploy information systems in a cloud infrastructure, pieces of configuration information relating to components constituting the information systems such as a VM (Virtual Machine) and a network have been widely managed as configuration definition files. In this case, the configuration definition files are collectively described, for example, in an XML (Extensible Markup Language) or JSON (JaveScript (Registered Trademark) Object Notation) format.
These configuration definition files are structured documents and therefore have general handling advantages (ease in partial cutout, copy, reuse, and the like) in a structured document. Therefore, typified configuration definition files of information systems are merged (integrated), and thereby it is possible to relatively easily generate a configuration definition file of a complex information system in accordance with individual requirements and to construct the information system.
However, to merge a plurality of configuration definition files, it is necessary to determine, between the configuration definition files, a component (a common component) to be a coupling point for connecting these files. To detect an appropriate coupling point, it is necessary to understand a role of a component defined in a configuration definition file of each information system, resulting in an increase in workload. Further, in a large-scale information system, there are a huge number of determination targets for coupling points in a configuration definition file, and therefore it is difficult to detect the coupling points. Therefore, it is desirable to generate, with a given quantity of work regardless of a scale of an information system, a configuration definition of a new information system by a combination (reuse) of existing configuration definition files.
It is conceivable that an analysis technique for finding a coupling point between configuration definition files is a technique for detecting a similarity or a difference between text documents. As such a difference detection technique for text documents, for example, a DIFF command of a UNIX (Registered Trademark) system is known. The DIFF command compares two text files on a line-by-line basis and presents changed lines and added lines. Further, PTL 1 discloses, as a difference detection technique of structured documents, a technique for using structure information of structured documents. The difference extraction apparatus of PTL 1 represents documents using tree structures based on tags of HTML (Hyper Text Markup Language), compares the documents between corresponding nodes on the tree structures in accordance with comparison references provided for the tags, and performs difference extraction.
As a related technique, PTL 2 discloses a technique for integrating structured documents having the same logic structure. Further, PTL 3 discloses a technique for providing a reference relation between components in a document and using the reference relation upon component correction.