1. Field of the Invention
This invention relates generally to construction and/or description of polyhierarchical classifications, and, in particular, to construction and/or description of computer-stored polyhierarchical multi-criteria classifications with intrinsic recognition of domains of classification criteria applicability and simultaneous (random) access to applicable classification criteria.
2. Description of the Related Art
Classification of sets of arbitrary entities such as objects, relations, processes, concepts, subjects, etc, is a basic paradigm used by both the human mind and present-day information technologies for storage, retrieval, analysis and systematization of knowledge. The kernel principle of classification is decomposition of a classified set into a number of classes (categories) in accordance with a system of rules (criteria). If categories are ordered by a directed relationship, such as “abstract-concrete”, “general-specific”, or “parent-child” they form a polyhierarchical structure. The term “polyhierarchical structure” is intended to include both single and multiple inheritance relationships between categories. In other words, a category in a polyhierarchical structure may have one or more than one parent.
Polyhierarchical classifications provide a dramatic increase of functionality as compared with classifications constructed without ordering categories by their abstraction level. In fact, the latter can be used only to store, search for, and retrieve information. In contrast, the former creates a well-developed formalism for manipulating systems of interrelated abstract entities, thus providing the ability to process information across different abstraction levels, create new languages, formalisms, concepts, and theories.
Persistent polyhierarchical classifications include structures that are relatively stable. Persistence of a classification denotes that a set of categories and system, for example, of the “general-specific” relationships between them must be pre-designed and stored in a permanent descriptive repository. Further extensions and refinements of a persistent classification may include the introduction of new criteria, categories, and relationships. Previously developed parts of a persistent classification ordinarily remain unchanged when extending a classified set, adding new selection options to existing criteria, and introducing new criteria. Moreover, a run-time modification of a persistent classification is generally not permitted. This means, in particular, that the accessible search options including keywords and ranges of parameters are permanently stored in the descriptive repository.
Persistent classifications are a foundation for collaborative development of general, reusable, and standardized systems. For example, hierarchies of classes, subjects, and aspects in object-oriented (‘OO’), subject-oriented (‘SO’), and aspect-oriented (‘AO’) programming, respectively, are persistent classifications. The classifications used in natural sciences, such as taxonomies of species, classifications of minerals, chemicals, astronomical objects, natural languages, fundamental particles, mathematical abstractions, and countless others are persistent as well.
Classification schemes are used in the vast majority of modern computer-aided information systems such as electronic data repositories, computer modeling environments, expert systems, and many others. In particular, electronic data repositories are increasingly being used to store, search for, and retrieve data. These repositories are capable of storing and providing access to large volumes of information.
The Internet is one factor that has contributed to the demand for electronic data repositories and to their proliferation. A large number of websites on the Internet, for example, allow users to search though data repositories and retrieve information free of charge. Several well-known examples include websites advertising vehicles available for purchase. These websites typically allow the user to search though the repository by entering search criteria, such as the make of the vehicle, the model, price range, color, and the like. Internet search engines are another example of an application that searches for, and retrieves information from an electronic repository. Other applications include catalogues and directories, online documentation, and components of operating systems, as well as countless others. In short, the ability to electronically search for and retrieve information has become essential in a number of different software and commercial environments. Data repositories are often very large in size. Managing, organizing, and classifying the data is essential in maximizing the usefulness of the repository. The usual approach is to organize and manage the repository using a multi-criteria classification scheme, which can be hierarchical and/or persistent depending on the desired functionality.
A number of advanced applications work with sets of abstract entities rather than plain data. These applications may include OO, SO, and AO programming environments, as well as, component based software engineering (CBSE) systems, intelligent databases, content management and expert systems. Such applications explicitly use persistent hierarchies of classes, aspects, etc. as formal schemes for defining entities of different abstraction levels, describing relations between them, and manipulating abstract entities rather than specific objects.
The use of hierarchical classifications provides a mechanism for logical operations, such as generalization, specialization, and composition of information. For example, the OO programming paradigm is based on class hierarchies formed by inheritance relationships. Under this approach, a child class includes the data (instance variables) and functions (class methods) of its parents, along with some additional ones. In other words, the child class is similar to its parents except for some additional features. This creates a so-called abstraction mechanism (i.e., a way of accessing a class object by reference to its abstract parent class with automatic data mapping and class method dispatch). Object-oriented hierarchies can be treated as multi-criteria classifications whose criteria are represented by sets of inheritance relationships sharing common parent classes.
Modern approaches to multi-criteria classification schemes generally use representations in terms of trees, directed acyclic graphs (‘DAGs’), compositions of trees, or set based formulas. These approaches, however, do not provide efficient support for development, maintenance, and use of general persistent polyhierarchical classifications. Several disadvantages of present-day multi-criteria classification schemes are discussed below for the case of a simplified classification of automobiles.
In FIG. 1, an illustrative tree-structured hierarchical classification scheme 100 is presented, where boxes (nodes of the tree) denote categories. The tree structure 100 graphically presents one illustrative example of a system of parent-child relationships, described above. For example, node 104 is the parent to nodes 108 and 112. Likewise, node 112 is the parent to nodes 116, 120, and 124.
The criteria in this example include manufacturer name, model year, engine type, internal combustion (IC) engine family, electric power source, fuel type, gasoline grade, and battery type. Some criteria are applicable to only specific kinds of cars, but not to other types of cars. For example, the “gasoline grade” criterion is applicable only for cars with IC engines requiring gasoline fuel. Likewise, the “battery type” criterion, in this illustrative example, is applicable only for electric cars with battery power sources. Such criteria can be called conditional criteria because their applicability depends on specific selections made under more general criteria.
Information on available cars in a hypothetical electronic data repository may be organized and searched based on the criteria shown. For example, data entries related to Toyota cars manufactured in 2003 with internal combustion piston engines fueled with regular gasoline should be classified under node 128, while data on electric Toyota cars manufactured in 2003 with Lithium Ion batteries should be classified under node 132. To retrieve information on these cars, the corresponding attribute values (i.e., Toyota, 2003, IC engine, etc.) may be entered in succession.
Unfortunately, the tree-structured hierarchical classification scheme 100 forces the developer to decide early on which criterion is most important. For example, in FIG. 1, the most preferable (i.e., most significant) criterion in the classification scheme 100 is “manufacturer name”. The second and third most preferable criteria are “model year” and “engine type”, respectively. The developer is forced to rank the importance of the different criteria because tree hierarchies require strictly predefined sequence of selections. The applicable, but lower ranking criteria are not searchable until the higher ranking (i.e., more preferable criteria) are satisfied. For example, the classification 100 does not provide the capability to search for electric cars directly. Instead, the search begins with the most preferable criterion, the make of the car. After this selection, the search progresses with the next most preferable criterion, the model year, and so on. If information on all electric cars had to be retrieved, using this classification scheme, a variety of combinations of makes and model years must be browsed by moving logically up and down the tree 100. This limitation is commonly referred to as the “predefined path” problem.
Another disadvantage of tree-type hierarchies is the mutual exclusivity of subcategories corresponding to different selection options of a criterion. When a category of objects is specialized by a criterion, only one of the available options is selectable (i.e., different options are considered to be mutually exclusive). This may be confusing, for example, if a feature defined by a lower-ranking criterion is equally applicable for several options of higher-ranking criteria. For example, cars with internal combustion engines in the classification 100 are supplied with engine specifications like IC engine family, fuel type, etc. A practical classification scheme should include the same specifications for hybrid engine cars, since they are also equipped with IC engines. In other words, the sub-tree rooting from node 104 has to be duplicated starting from node 136. If, for example, information was needed on all cars having a rotary internal combustion engine, the information is not capable of being retrieved in one step. Instead, the selection of engine type (e.g., internal combustion, hybrid, etc.) is made first, thus requiring separate searches of hybrid cards and regular cars with IC engines, and the results are then manually combined. This problem is made more confusing if access to a feature of interest required multiple selections for every combination of appropriate higher-ranking options.
These disadvantages arise, at least in part, due to the conjunctive logical structure of tree hierarchies. Elementary specializations performed by selecting options by different criteria describe a set of traits connected by the logical operator ‘AND’. For example, node 124, in FIG. 1, describes a subcategory of cars “manufactured by Toyota” AND “made in 2003” AND “having internal combustion engines” AND “having piston IC engine” AND “fueled with gasoline”. A one-step search for cars with rotary engines would conceivably be possible by using the disjunctive formula “internal combustion” OR “hybrid” engine. However, tree hierarchical structures do not support disjunctive superposition of properties (i.e., they do not allow the developer to describe sets of traits combined by logical OR).
Another disadvantage of tree-structured classifications relates to fast multiplication of sub-trees with increases in simultaneously applicable criteria. Continuing with the example of FIG. 1, if the simplified classification 100 includes twenty manufacturer names and five model years, then the sub-tree starting from the criterion “engine type” would have to be repeated for all meaningful combinations of these options (about 100 times). If the classification includes three additional criteria: “brand” (10 options on average), “exterior category” (10 options), and “price range” (10 options), the total number of sub-trees duplicated increases up to about 100,000.
Furthermore, a more comprehensive specialization of technical characteristics of piston engines (ICP) may require introduction of at least three more criteria: “ICP family”, “number of cylinders” and “cylinders volume range” with approximately 6 to 8 options each. In this case, the sub-tree starting from the criterion “fuel type” would be repeated 20,000,000 to 50,000,000 times. Finally, a full-scale commercial version of the car classification would implement about 70 criteria in total, and the respective tree structure would contain an astronomical number of nodes. A vast majority of these corresponding categories are intermediate abstract categories and empty leaf categories because there are only a limited number of different car models in the world. However, to support the appropriate sequences of transitions between categories and retrievals of respective criteria, in most cases, a large percentage of the intermediate nodes must be enumerated and stored. Therefore, such a structure would become unmanageable due to the amount of data stored in a repository or incorporated in a computer program to support the tree hierarchy.
Directed acyclic graphs (‘DAGs’) that can be viewed as generalization of trees are one approach used to reduce the aforementioned predefined path problem. Similar to trees, DAGs represent hierarchical classifications as category sets strictly ordered by directed relationships, such as “abstract-concrete”, “general-specific”, “parent-child”, etc. However, in contrast to trees, DAGs allow each category to have more than one parent (i.e., DAGs utilize the so-called multiple inheritance concept).
FIG. 2 illustrates a relatively small topmost fragment of a DAG representing the same sample classification of automobiles shown in FIG. 1. Vertices of the graph 200 (boxes) and its edges (arrows) denote, respectively, classification categories and inheritance relationships between them. Due to simultaneous applicability of some criteria the shown polyhierarchical classification uses multiple inheritance. For example, the vertex 216 of the graph 200 has two parent vertices: 204 and 208. Likewise, the vertex 228 is a common child of the vertices 216, 220, and 224. When performing a search, multiple inheritance mechanism provides an opportunity to use any criterion applicable at the current level of specialization.
A search may be started with any of the criteria, “manufacturer name”, “model year”, or “engine type” applicable to all cars. After a selection, the search progresses with the remaining originally applicable criteria (if any), as well as with other criteria that may become applicable due to the selection just made, and so on. For example, if “internal combustion” of the criterion “engine type” is selected, the next selection available includes one of the remaining criteria “model year”, “manufacturer name”, or the new criterion “IC engine family” applicable to all the cars with IC engines. In contrast to trees, DAGs provide simultaneous (random) access to all currently applicable criteria, and a sequence of selections corresponds to a particular path on the graph. For example, the vertex 228 can be reached from the root “ALL CARS” by any of six paths: (→204→216→228), (→204→220→228), (→208→216→228), (→208→224→228), (→212→224→228), or (→212→220→228) corresponding to six respective criteria transpositions.
Directed acyclic graph structured polyhierarchical classifications resolve the predefined path problem at the expense of an even more dramatic increase in the amount of descriptive data. To provide a full variety of possible selection sequences, all meaningful combination of options from different criteria, and all possible transitions between them must be represented by graph vertices and edges. To illustrate by example, a topmost sub-graph reflecting only five globally applicable criteria of the car classification: “manufacturer name”, “model year”, “brand”, “exterior category”, and “price range”, would contain 167,706 vertices and 768,355 edges. Due to the large amount of mandatory stored data, DAG representations are not relevant for a vast majority of practical applications.
As described above for tree-type hierarchies, DAGs also include the disadvantage of the mutual exclusivity of different selection options of a criterion, discussed above. Thus, logical disjunctions of traits are not allowed when developing and using DAGs structured polyhierarchical classifications. Directed acyclic graphs introduce an additional limitation in relation to testing for the “parent-child” relationships between mutually distant categories. In FIG. 2, for example, this problem is illustrated when testing whether vertex 228 is a distant child of vertex 232.
A DAG is usually stored in a computer as an array of vertices, where each vertex is supplied with lists of its immediate parents and children. Continuing with the example shown in FIG. 2, to check whether vertex 228 is a distant child of vertex 232, a first step is to determine whether the list of immediate parents of vertex 228 includes vertex 232. If it does, then the latter is a parent of vertex 228. If not, the next step is to check the immediate parents of vertices 216, 220, 224 for the presence of vertex 232. If vertex 232 is found in one of these lists, then it is a grandparent of vertex 228. Otherwise the test is continued with lists of immediate parents of the grandparent vertices, and so on. If vertex 232 is not found, the algorithm finally reaches the root vertex “ALL CARS”. In this case, it is concluded that vertex 232 is not a distant parent of vertex 228. From this example, it is clear that the test requires a combinatorial search over all levels of intermediate parents; hence its cost exponentially grows with the increase of the number of levels. Therefore, a test for distant inheritance may consume an unacceptable large amount of computer resources when processing relatively large DAGs.
To reduce the described problems with trees and DAGs, modern “synthetic” classification methods use compositions of multiple trees, changing the most preferable criteria for each tree. In particular, this approach may be implemented via the concept of “faceted classification”. FIG. 3 illustrates one application of facets to the sample classification of automobiles shown in FIGS. 1 and 2. In this example, instead of arranging classification categories into a single polyhierarchy, the method uses a number of facet hierarchies, each reflecting an independent and persistent classification aspect.
The classification aspects represented by different facets are mutually exclusive and collectively form a description of object properties identifying classification categories. Mutual exclusivity of aspects means that a characteristic represented by a facet does not appear in another one. In this example, the sample classification 300 includes five facets: “manufacturer name”, “model year”, “engine type”, “fuel type”, and “battery type”. In contrast to trees and DAGs, a faceted classification does not define categories in advance. Instead, it combines the properties described by different facets using a number of loose but persistent relationships. For example, the category 124 of the tree classification 100 corresponds to a composition of the four categories 304, 308, 312, and 316, pertaining to different facets. These categories are called facet headings.
When performing a search, a selection may be made from the facets in arbitrary order. For example, a selection may specify internal combustion engine (node 320 of the facet “engine type”), Toyota (node 304 of the facet “manufacturer name”), gasoline fuel (node 316 the facet “fuel type”), year 2003 (node 308 of the facet “model year”), piston engine (node 312 of the facet “engine type”), and so on. Each facet functions like an independent hierarchical classification (i.e., after each selection the process moves to the next applicable criterion, if any). At each step of specialization, a computer program supporting faceted classification retrieves the list of car models having the set of properties collectively defined by different facets.
Unfortunately, faceted classifications include a number of limitations. For example, faceted classification methods require splitting a classification into a set of independent hierarchies, which hides domains of criteria applicability. In the illustrative example of FIG. 3, the facet “fuel type” is applicable only to cars with internal combustion engines, while the facet “battery type” is applicable only to electric cars. The logical structure of the classification 300 itself does not include rules defining applicability of the facets in different contexts. To provide the classification with automatic recognition of domains of facets applicability, the developer is forced to supply the classification with additional descriptive data structures and/or managing programs. When developing a full-scale practical classification containing dozens or even hundreds of facets, these auxiliary descriptions and/or programs may become extraordinarily sophisticated. For example, to describe appropriate systems of facet interactions, modern Faceted Knowledge Representation (FKR) approaches, involve cumbersome mathematical constructions such as association and production rules, hierarchical relationships, roles and purposes of facets, meta-facets, and the like.
These techniques are used to describe multi-level systems of relationships between finite sets of units characterized by their relations to other units but not by their internal properties, and, in particular, to establish domains of facet applicability. Advanced FKR methods are capable of representing sophisticated systems of relationships, but when implemented for constructing complex polyhierarchical classifications based solely on “general-specific” relations, they become inconvenient for practical implementations due to the large number of auxiliary data structures. Such an approach becomes exasperating for the developer because it requires manipulating highly abstract concepts, but does not offer a clear logical approach to building classification.
In addition, faceted classifications do not automatically provide a persistent polyhierarchical structure of a classification. In fact, faceted classifications implement persistent inheritance relationships only within separate facets. The final classification categories are formed dynamically in run-time and are described by combinations of independently specified properties. If some facets are not globally applicable, a global polyhierarchical structure is not defined unless supplementary rules for defining compatibility and priority of headings from different facets are introduced. For example, it is not possible to check directly whether the category “Toyota cars fueled with gasoline”, defined by a composition of the headings 304 and 316 in FIG. 3, is included in the category “Toyota cars having internal combustion engines”, defined by a composition of the headings 304 and 320. Generally, extra rules for defining cross-facet inheritance relationships can be described using auxiliary data structures or program codes, mentioned above, but this would only move the problem from one part of a project to another. Because of the lack of global polyhierarchical structure, faceted classifications are ordinarily only implemented in plain data repositories supporting approximate interactive search and retrieve operations, which are usually supplemented with additional specialization techniques, such as search by keywords. They are not relevant for more advanced applications, such as supporting fully automatic classification of objects, search and retrieval of information, run-time logical operations on abstract categories, etc. without human control.
Moreover, in practical cases, it can be difficult to appropriately separate classification aspects for representation by a set of independent hierarchies. One approach is to build a relatively small number of large multi-criteria facets. If, for example, the facets “fuel type” and “battery type” shown in FIG. 3, were included as sub-hierarchies in the facet “engine type”, the classification 300 would automatically resolve domains of criteria applicability. However, in this case, the developer would encounter the same problems of predefined path and/or category multiplication typical for large trees and DAGs.
Smaller facets generally improve flexibility of the classification. If, for example, the criteria “IC engine family” and “electric power sources” are extracted and represented as independent facets, they may then be suitable for use in wider contexts. This classification design, however, would result in further encumbering supplementary data structures or program codes defining applicability and consistency of facets in terms of roles or purposes of facets, meta-facets, etc. Therefore, a classification developer has to find an optimal design that reduces the complexity of both individual facets and rules of their interactions (i.e., satisfy two contradictory requirements). In practice, the solution to this problem may be difficult or nonexistent. As a result, many faceted classification tools do not include mechanisms for the control of applicability and consistency of facets, thus creating an opportunity for errors when developing and using the classification tool.
Other techniques of tree or DAG compositions are unified by the concepts of “separation of concerns” (‘SOC’) and “multi-dimensional separation of concerns” (‘MDSOC’). These approaches are currently used for building software engineering environments for subject and aspect oriented programming (‘SOP’ and ‘AOP’, respectively) and subject oriented databases (‘SOD’). SOC, for example, has been developed as a supplementary tool for existing OO programming languages, such as C++, Java, and Smalltalk.
In an attempt to solve the predefined path problem, these approaches introduce one or more additional tree-structured hierarchies, similar to the unified modeling language (‘UML’) class diagrams that provide crosscutting access to categories of the dominant class hierarchy. In other words, different trees representing areas of concern are built and associated with the dominant tree of classes. In one example, SOC allows a developer to build any number of overlapped trees associated with the same set of classes. A set of user-defined composition rules describes application-specific scenarios of the class method dispatch and data mapping. MDSOC supports composing concerns into tree-structured hyperslices considered hyperplanes in the hyperspace of concerns, thus allowing so-called “multiple classifications” based on compositions of concerns.
SOC and MDSOC are specialized approaches intended solely for efficient non-invasive extension of object-oriented computer codes while keeping the advantages of the object-oriented inheritance mechanism. They cannot realistically be considered as general principles for constructing complicated polyhierarchical classifications with dynamically retrieving particular sub-hierarchies in run time. For instance, both concerns and hyperslices are typically tree-structured hierarchies. Generation of a new hyperslice is a static procedure since it requires additional programming, recompiling, and re-debugging the code.
In addition, the composition rules used for defining hyperslices depend on specific features of the basic object-oriented environment and descriptions of particular software system units. Structure of the dominant object-oriented class hierarchy imposes restrictions on construction of auxiliary hierarchies since the latter must refer to the same classes, instance variables, and class methods. This problem is commonly referred to as “tyranny of dominant concern”. If a classification scheme uses some heuristic criteria that cannot be formally derived from the existing source code, module configurations, and the like, then a comprehensive description of additional composition rules has to be manually developed. In general cases, it is expected to be an arduous job that should require a great deal of professional expertise.
Moreover, due to their narrow specialization, SOC and MDSOC use comprehensive descriptive structures, such as sets of sub-trees describing concerns and hyperslices, rules of class method dispatch, and the like, which are unnecessary for the classification purpose itself. Even after removing the object-oriented specific components and leaving only descriptions of inheritance relationships, dependencies would not allow SOC or MDSOC to be implemented for real-world polyhierarchical classifications due to the amount of programming work and computer resources required for development, storage, and maintenance.
Another classification approach is based on using set-theoretic operations and logical formulae for building a classification in run-time. These approaches generally use the concept of “set based classification”. They are typically implemented in the so-called dynamic classification tools, as well as in the rough sets theory and granular computing methods intended for machine learning and data mining.
A set based classification typically uses an information table containing attributive descriptions of properties of classified objects. FIG. 4 illustrates an information table 400 corresponding to the illustrative classification of automobiles shown in FIGS. 1, 2, and 3. A first field 404 of the table 400 lists classified car models, while the remaining eight fields specify car characteristics. Each of these eight fields corresponds to a criterion from the tree classification 100 shown in FIG. 1.
Table cells contain the attributes defining respective car characteristics, where each relevant attribute corresponds to one of the available selection options. The set of attributes from a table row exhaustively specifies a composition of characteristics definable by the eight-criteria classification. The attributes can be represented not only by enumerated identifiers but also by loose keywords or numerical parameters taking values from a continuous range. A search may be conducted that includes the selection of discrete attributes and ranges of attributive numerical parameters in arbitrary order. At each stage of selection, the repository management system retrieves a set of all objects having the specified subset of attributes. For example, using the table in FIG. 4, a search can be narrowed step-by-step by successively selecting options, such as “fuel type=gasoline”, “model year=2003”, “IC engine family=rotary”, “manufacturer name=Mazda”, and so on. The search proceeds until the retrieved set of cars is reduced to an acceptable size. In this manner, set based classifications support random access to all the classification criteria, thus resolving the predefined path problem.
Moreover, set based classifications permit retrieval of specific subsets defined by arbitrary compositions of set-theoretic operations, such as intersection, unification, and difference of subsets. When performing a search, compositions may be represented in terms of logical combinations of constraints imposed on the attributes. For example, the following illustrative formula may be used when searching the table 400 ((“fuel type=gasoline” AND “manufacturer name=Mazda”) OR (“fuel type=diesel” AND “manufacturer name=Toyota”)) AND (“model year>2000” OR NOT “IC engine family=rotary”).
Unfortunately, set based classifications are a specialized approach not generally applicable for development of real-world polyhierarchical classifications. The approach does not imply the existence of a global persistent polyhierarchy. For example, when performing a search with a dynamic classification tool, each category is described by a user-specified logical formula without any relation to other categories. Rough sets and granular computing based systems automatically build hierarchies of the so-called decision rules expressed in terms of logical formulae. However, these hierarchies are intended solely for making particular conclusions based on statistical correlations between properties of available objects, rather than for building pre-designed multi-criteria categorizations. They are not persistent because their structure depends on available sets of objects listed in the information table. Moreover, because of tree structuring, the decision rule hierarchies restore both predefined path and category multiplication problems.
Information tables do not use domains of criteria applicability. In a typical case, many criteria will only be applicable to a few of the objects, thus resulting in numerous empty or “N/A” cells. The more conditional (i.e. locally applicable) criteria that are used the greater the percentage of empty cells. As a result, when storing information on qualitatively diverse objects, information tables become very inefficient. Moreover, the lack of automatic control of criteria applicability creates an opportunity for errors during data input into the information table. In fact, when describing a new object with conventional classifications, a data entry person manually selects all the criteria applicable to the object and enter attributes for those criteria. In a real-world application, a classification can use dozens or even hundreds of criteria, while only a few of the criteria may be applicable to a particular object. Without the advantage of automatic recognition of criteria applicability, correct data input becomes unmanageable. For example, if a classification does not provide automatic recognition of criteria applicability, some applicable criteria may be missed, or attributes by non-applicable criteria may be mistakenly entered.
Recently developed advanced search systems, such as Universal Knowledge Processor (‘UKP’) uses the ‘dynamic taxonomies’ technique (described in Italian Patent No.: 01303603), combine faceted and set based classification approaches. When interactively searching for information, the dynamic taxonomies provide a graphic user interface that allows for specializations to occur using different facets while concurrently performing set-theoretic operations between them. However, this approach inherits disadvantages of both set-based classifications, such as lack of a pre-designed global polyhierarchy and dependence on the amount of available data, and faceted classifications, such as predefined path and sub-tree multiplication problems. Its range of applicability is therefore limited. It cannot be used, for example, for non-interactive retrieval of information, manipulating abstract categories without reference to available objects, and describing diverse sets of objects.
What is needed, therefore, is a more general approach to the construction of hierarchical classifications that may provide, for example, the following set of features:    1. Global polyhierarchical system of classification categories supporting intrinsic recognition of domains of criteria applicability and simultaneous (random) access to all the applicable criteria;    2. Persistence of the polyhierarchy and, in particular, invariance of its previously developed part with respect to extension of the classified set, addition of new selection options to existing criteria, and introduction of new classification criteria;    3. Compactness of descriptive data structures that provide the ability to avoid cumulative multiplication of explicitly enumerated and mandatory stored classification categories, as well as interrelations between them, or other descriptions;    4. Support for set-theoretic operations, including intersections, unifications, complements and differences of sub-categories;    5. Efficient realization of the algorithm of testing categories for distant inheritance relationships; and/or    6. Conceptual simplicity of the design process, as well as further unplanned extensions and refinements.The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.