1. Field of the Invention
The invention relates generally to the manipulation of representations of graphs in computer systems and more specifically to automated techniques for conceptually factoring and/or unifying graphs.
2. Description of Related Art
Information is useful only if it is accessible. There are two senses in which it must be accessible: those who need it must have physical access to it, and it must be indexed or cataloged so that those who need a particular item of information can easily find what they want. The data processing and communications revolutions of the second half of the twentieth century made it possible both to store much more information and to provide much more physical access to the stored information than ever before. The database technology component of the data processing revolution also made data cataloging and indexing easier than ever before, but the users of the information needed far more flexibility in finding, viewing, and analyzing the information than the relatively rigid database systems could provide.
The development of electronic commerce, or E-commerce, made flexible access to information more important than ever before. If E-commerce is to succeed, a Web merchant has to offer the E-shopper easier access to the goods or services being being sold than what the shopper can get by ordering from a mail-order catalog or by going to the local shopping mall. To the shopper, access is only easy if it is access the way the shopper wants to have it, and in the E-commerce context, that means that the Web merchant must offer the shopper as many different ways to access the goods or services as there are kinds of shoppers.
A particularly effective way of providing flexible access to information is that described in the PCT International Application PCT/US00/01042, J. Anthony, A system for composing applications based on explicit semantic models, event driven autonomous agents, and resource proxies, filed 14 Jan. 2000 and published Jul. 20, 2000 as International Publication Number W0 00/42529. FIGS. 1-12 of PCT/US00/01042 are included in the present patent application along with those parts of the Detailed Description that describe them. The system that is the subject matter of PCT/US00/01042 will be termed in the following the Ariadne system. In the Ariadne system, representations of graphs are used to organize information. Vertices in the graphs represent items of information and concepts that organize the items of information and edges in the graphs represent relationships between the vertices. In E-commerce, the items of information are typically product descriptions, while the concepts organize the product descriptions so that the Web shopper can access them in various ways. For example, a description of a given kind of shoe may be accessible not only via the concept “shoes”, but via concepts such as “leather”, “men's wear”, “formal wear”, “color”, and so forth. The concepts themselves are organized into models. Each model belongs to a particular model type. The model type for the model specifies the properties of the edges that connect the vertices representing the concepts. An overview of the Ariadne system's graphs, models, and model types may be found in the sections Using graphs to specify multiple aspects of a collection of data through Relating concepts to the world in the Detailed Description of the present patent application.
While Ariadne models make providing flexible access to information easier than ever before, the models must be made and maintained. When Ariadne is used for E-commerce, for example, the models that describe the products must be made. There is information and to spare in catalogs and databases about the products to be accessed using Ariadne models, but the models must still be made from the information A solution to that problem is described in the PCT international application PCT/US01/02688. J. S. Anthony and Dean T. Allemang, Software composition using graph types, graphs, and agents, filed 26 Jan 2001. As described in PCT/US01/02688, the Ariadne system uses graphs and agents, programs that are executed in response to events in the context provided by one or more models, to automatically convert catalogs represented in XML into Ariadne models. The same techniques can be used to convert other legacy representations of information into Ariadne models.
The maintenance problem, however, remains. It has two aspects: eliminating redundant information in a single model and integrating information from different sources.
Eliminating Redundant Information
Because catalogs are linear, they contain much redundant information; this information remains in the Ariadne model made from the catalog. The redundant information creates many problems:                Catalog size may increase, in some cases at a nonlinear rate relative to the number of truly is new categories that are being added to the catalog.        It is difficult to maintain catalogues consistently: updates may need to be made at numerous points within a structure.        Any given catalog structure will favor only certain styles of navigating and querying the catalog, with inadequate user support for other styles and other scenarios. Awkward “climb-around” navigation may be required to move to a conceptually closely related topic that is distant within the actual hierarchy. Mitigating this problem with ancillary links specified by human catalogers does not scale or persist well; such links are effort- and knowledge-intensive to create, maintain, and change over time.Integrating Information from Different Sources        
Reconciling information from different sources requires that the person doing the reconciling understand the differences between the sources and make tradeoffs between standardization and inclusiveness. Beginning with understanding the differences between the sources, when there are discrepancies between subtrees of a vendor's product catalog, the discrepancies may represent different language choices of different catalogers on different days, temporary gaps in the product line, or logical differences in the two contexts (say, men's vs. women's clothing). When we are integrating models from independent sources (for example, from two different vendor's catalogs) there are likely to be even more discrepancies of this kind. We need a technique that makes it possible to deal with such discrepancies quickly and in a uniform manner.
In making the tradeoff between standardization and exclusiveness, current technologies allow only two approaches: either a “one size fits all” approach which requires that each source of metadata conform to a single set of categories or a “kitchen sink” approach which takes the union of all the categories represented by all the sources. Hybrid approaches, like a fixed standard or “generic” model which defers to local models for any non-common sources, do not escape the problems of the two basic approaches.
There are numerous drawbacks to each approach.
For the standard model approach:                Standard sets of categories are often strongly resisted by different stakeholders in the business context—this is the case both within the enterprise, as in efforts at knowledge dissemination and centralized knowledge sharing, and in cross-enterprise contexts like buisness-to-business (B2B E-commerce.        Where standard sets pf categories can be adopted, there must be a design process for creating the standards; and this process, if not simple creation of categories by fiat, must involve some systematic study of candidate sources to synthesize a standard.        Once the standard set of categories is designed, each metadata source must do an initial conversion of its material to fit that standard. This is also an effort and knowledge intensive process.        Unless the independent information sources convert their own catalogs to the single standard, additional work will need to be done every time new inventory is made accessible via the standardized categories. The problem becomes more acute when there is a need to evolve the separate categories by, say, adding new lower-level categories. These must somehow be reconciled with the standard.        
For the “kitchen sink” union of all local metadata approach:                The solution winds up with many spurious duplicate categories in the main model. Some might represent true duplicates, others might represent homonyms or categories from different sources with a common name but different interpretation.        Even where the categories have distinct names, the converse problem exists. Sometimes the different names represent significant differences in the categorized content, other times, the different names are names used in different contexts for similar items.        Interpreting these connections between categories is certainly difficult to do. But if it is not done when the main catalog is made, we simply burden the user of the main catalog with the work. The user will need to make these interpretations every time a search for a specific item is performed, the user will have to make the interpretations without the catalogers' knowledge, and will have to do this even though the correct interpretations change slowly, if at all. So the “union” solution is in effect a non-solution that leaves the user to deal with problems that should have been solved by the catalogers.        The union approach does create categories that provide access, via a single query, to content from multiple original sources. For example, if clothing from different catalogs were integrated in this way, “Women's Garments” would be listed from Merchandiser A, “Women's Clothes” from Merchandiser B. No category would show both in a common query. Even if the system supported queries on multiple categories simultaneously (effectively, union vs. intersection operations) to return the content classified in multiple categories within a single list, the user would still need to know that these two categories were the ones to select. Of course, the more sources there are, the greater the burden on the user to select the proper categories.        There are some obvious other operational drawbacks, such as: large relative size of the common “kitchen sink” catalog; sensitivity of that catalog's look and feel to local changes made by separate catalog sources (either new categories get migrated in, changing the main catalog; or they don't and they have no connectivity to the main catalog); and the fact that, to disambiguate the categories, the typical trend will be towards explicit inclusion of the information source as part of the category (e.g., “Merchandiser A Women's Clothes”, “Merchandiser B Women's Garments”). At this point the union catalog is providing little value other than a single point of access for multiple collections. No real semantic integration has been performed.        
For both the approaches listed above, there is another serious drawback. So far we have discussed the integration problem from the point of view of the information source, e.g., a merchandiser's catalog. But, particularly in a B2B context, the consumer, procurement or demand side will have the same needs for customized categories to streamline repeated buying decisions. Neither the “one size fits all” nor the “kitchen sink” approach provides any means to support customer-specific views into the category system of this kind.
The reason that present approaches fail to address so many of these problems is that they all attempt to coordinate multiple information sources using an information representation that is no more powerful than the one used in the sources. For instance, we cannot reconcile discrepancies among several taxonomies with a single taxonomy without resorting to either one or the other of “one size fits all” and “kitchen sink”. The solution is to migrate to a richer semantic framework such as the one provided by the Ariadne system. To make the migration, techniques are needed for transforming existing information sources into richer semantic frameworks. Providing such techniques is an object of the present invention.