This invention relates to the field of data processing, and in particular to a method and system for comparing the content of multiple files. Such a method and system is particularly well suited for comparing configuration files that are associated with devices on a communications network, to verify data consistency among the devices.
In many systems, common data is expected to be associated with multiple elements of the system. In a conventional database system, for example, individual data sets would include references to the particular items of common data, so that when any common data item is changed, the change is automatically reflected in each of the data sets that reference this common data item. In a distributed system, common data can be similarly referenced by each remote element of the system, but such an approach would be extremely vulnerable to a single point of failure that affects access to the common data by the remote elements.
To assure reliability in a distributed system, a copy of the common data is generally maintained at each remote element of the system. Such distribution, however, introduces the possibility of different versions of the common data being present at different remote elements. Additionally, in many cases, the remote elements of a distributed system are not homogeneous, per se, and the form of the common data at different elements of a distributed system may often differ, increasing the likelihood of differences appearing at each element. In like manner, not all remote elements will necessarily share the same items of common data, and some elements may purposely be designed to use locally defined items in lieu of some of the items of common data.
A communication system comprising a network of devices is a particular example of a distributed system of non-homogenous elements that access data items that are expected to be common among at least a subset of the elements. For example, if TCP services are to be provided on a given network, all of the files that are used to configure the routers of the network would be expected to include a “TCP Services” entry. This particular entry may differ in format among different router vendors, and may appear at different locations within each particular configuration file.
For ease of reference and understanding, the collection of data at a remote element of a distributed system is herein defined to be located in a ‘file’, although one of skill in the art will recognize that this term refers to the logical arrangement of data, and such ‘files’ may be maintained in a variety of physical forms, including, for example, a data collection on multiple devices of the remote element. In like manner, the aforementioned term ‘distributed system’ refers to a logical distribution of elements, independent of the physical arrangement of such elements. Using this terminology, in a distributed system comprising multiple elements, each element possesses one or more files that contain data items, some of which data items are assumed to be common among all or some of the elements.
Conventional file comparators are generally unsuitable for comparing a large number of files. A typical file comparator compares two files and highlights the differences between the files based on a comparison of the text. Some file comparators are able to compare three files, using different methods of highlighting for each of the types of differences. For example, with three files, A, B, C, there are six different types of differences among the files: in A, but not in B or C; in A and B, but not in C; not in A or C, but in B; and so on. Comparing four or more files quickly becomes infeasible using conventional text based comparators.
In like manner, conventional file comparators are generally unsuitable for comparing files that have many non-common data items, because differences among the items that are expected to be common are not easily distinguishable from the different non-common data items. And, if the common data items are different only in form, such differences are also not distinguishable from the substantive differences among the common data items.
It would be advantageous to provide a means for comparing particular data items or sets of data items in multiple files to identify differences among the files. It would also be advantageous to provide a user interface that allows a user to formulate the comparison task easily and efficiently. It would also be advantageous to provide an output scheme that presents the detected differences in a substantively meaningful and understandable form.
These advantages, and others, can be realized by providing a scalable comparison structure and methodology that is suitable for comparing select data content in hundreds or thousands of files in an efficient manner. Section delimiters are defined to identify the sections of the files within which the select data content is located, and sets of unique sections are identified based on the select data content within the section. Thereafter, comparisons and reports are based on these unique content sections. If multiple files include a common set of data, a single unique content section is used to represent these multiple files. File groups are optionally defined, and different sets of select data content can be compared based on these file groups. The result of the comparison is presented in multiple hierarchical forms, including an identification of which files are different from each other, and an identification of the differences among the unique content segments.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions. The drawings are included for illustrative purposes and are not intended to limit the scope of the invention.