An ontology may be broadly interpreted as a dynamic, controlled vocabulary for describing a plurality of objects or concepts contained in a process or domain, and their interrelationships. An ontology may be further thought of as a formal knowledge-representation system which comprises rules for encoding the manner in which knowledge is represented, along with rules that enable automated reasoning with regard to the objects or concepts represented.
An ontology term may be a single named concept describing an object or entity. A concept may, for example, comprise a collection of words and associated relevance weights, co-occurrence and word localization statistics that describe a topic.
In various disciplines, scientific or otherwise, a number of resources (e.g., data management systems) may exist for representing cumulative knowledge gathered for different specialty areas within each discipline. Some existing systems, for instance, may use separate ontologies for each area of specialty within a particular discipline. One drawback associated with such ontology systems, however, is the inability to link different areas of specialty within a discipline.
Another drawback associated with existing ontology systems is the inability to map ontology terms to infer linkages to related or referenced entities in wider data sources.
Moreover, while existing ontology systems generally offer descriptions of a particular discipline syntactically (in terms of nouns or entity names), they often lack a particularly well-developed set of semantic relationships (in terms of verbs or relationship between entities) built into the ontology.
Yet another drawback associated with existing ontology systems is a general lack of names in an ontology that correspond to all the information or concepts in a document set. For example, in existing systems, ontologies are often specific to a few sub-domains, and to the data sources that are relevant to those sub-domains. For example, in a life science system, a gene ontology (GO) may deal with objects that are related to cellular locations, biological processes, and molecular functions, while potentially missing a number of other aspects or objects relevant to a particular life science domain. This may create potential “completeness” issues if the gene ontology was alone used to represent all of the information in a document set pertaining to the life science domain.
In existing systems, the ontology for each sub-domain may be populated unevenly. Some sub-domains (representing an area of active research) may be relatively well populated when compared to other sub-domains. Additionally, even within a sub-domain, differing levels of detail are often provided. Some areas may be relatively well-defined and comprise easily discriminated concepts, while other areas may not. Existing systems do not address evaluating the levels of specificity and how discriminating terms in an ontology may be.
In life science disciplines, for example, many ontology terms have not been tested against any current database, particularly in information content and retrieval terms. In addition, many of the terms do not have any references to external databases. This may result in a user not knowing beforehand whether it is possible to discriminate between any pair of terms based on the available evidence for those terms and the availability and/or capability of information retrieval (IR) tools. If a user cannot discriminate between a pair of terms using existing information retrieval tools, the terms may effectively be treated as one in the same. As such, another drawback in existing ontology systems is the apparent inability to generate the maximally diverse set of discriminable terms that may be derived from a starting ontology.
Yet another drawback in existing ontology systems is the apparent inability to discover a new set of terms that describe identifiable, discriminable concepts in the rest of an information space.
Still yet another drawback associated with existing ontology systems is the apparent inability to evaluate the distance between a pair of ontology terms.
Further, existing systems deal with information that may be inherently represented by terms in an ontology. This begs the question, “What happens if information exists that has not been codified in an ontology?” There is a great deal of information space that unfortunately may not be fully described by terms in existing ontologies.
These and other drawbacks exist.