Generally, ontology is a study which focuses on understanding how things in the world can be divided into categories and how these categories are related to each another. Thus, when applied to a computer science interpretation, “ontology” is used with a related meaning to describe a structured, automated representation of the knowledge within a certain domain. Such domains may be science, government, industry, and healthcare. Essentially, an ontology provides a classification of the entities within a domain. Each entity is said to make up a term of the ontology. And, an ontology must specify the semantic relationships between the entities. Accordingly, an ontology can be used to define a standard, controlled vocabulary for a given industry or body of knowledge. Ontologies are sometimes represented as directed graphs having a set of nodes (or vertices) and edges (or links) between the nodes. In directed graphs, the edges are one-way and go from one node to another. The nodes of the directed graph for an ontology correspond to the terms of the ontology. They are assigned to entities in the domain and the edges between the nodes represent semantic relationships.
There is a semantic relationship used called the subsumption relationship which encodes that the concept behind one term subsumes the concepts of another term, i.e., it encodes generalization-specification relations. The subsumption hierarchy of an ontology may be represented by a directed acyclic graph (DAG). A cycle in a directed graph is a path along a series of two or more edges that leads back to the initial node in the path. For subsumption hierarchy of ontologies, terms appearing closer to the root are more general than their descendant terms.
In other situations, ontologies may be used in artificial intelligence, the Semantic Web, biomedical research, etc. With ontologies, one may search semantically-related descendant terms, which can increase the number of documents found as compared to methods that merely take the search term as a text string. Further, ontologies can be used to classify and describe the items (entities) of a domain. For instance, an ontology may be used to classify European wines (See, e.g., FIG. 1).
Domain ontologies may be used to improve searches by linking the ontology terms that are semantically linked. For example, if a user is searching an online wine catalog for “white wine”, the wine ontology can be used to return not only catalog items listed as “white wine”, but also items listed as “Riesling”, “Sauvignon Blanc”, or “Grauer Burgunder”.
Another ontology type is used describe the attributes (or characteristics) of the items in a domain. For example, an ontology that describes the attributes of wine including its color, smell, taste, acidity, and so on (See, e.g., FIG. 2) can be used to annotate various kinds of wine in a catalog. For example, a Riesling might be annotated with the terms white, dry, and lemon, and a Rioja might be annotated with the terms red, dry, and almond scent. In this example, annotations follow the direction of the edges in the graph, so that “Riesling” is said to directly annotate “white”. Further, for example, a user of an online wine catalog can enter the characteristics preferred in wine, such as red, nutty, and medium dry. The Bayesian inference methods presented in embodiments herein in the present invention can thus be used to identify the catalog items (e.g., wines) that best match the user's query. Or, for example, clinical features may be used attributes of medical conditions (e.g., diseases). Physicians can enter the clinical features they have observed in their patients and use the inference algorithms to identify the best matches among a database of diseases that have been annotated with a set of clinical features.
When annotating items via terms of an attribute ontology, often the so-called annotation propagation rule is implied. This rule holds that if one item is annotated to a term, then it is also annotated to all terms connected to that term along specific relations. This rule is useful, e.g., for the terms connected by subsumption relationships, because it means that annotations are propagated along more general terms towards the roots of the subsumption hierarchy. In other words, besides asserted annotations (i.e., direct connections between nodes), annotations can exist between nodes that are not directly connected.