Embodiments of the present invention relate to ontologies, and more specifically to techniques for creating ontologies from other data sources.
Businesses often have internal business policies intended to address a wide range of issues such as security, privacy, trade secrets, criminal activity of employees or others with access to the business, and many others. These business policies address various aspects of a business, such as purchasing, selling, marketing, and internal administration. Because of the large number of activities occurring during the course of running a business, which may have various entities located in a variety of geographical locations, it is often impractical to manually monitor all activities in which improper behavior or mistakes may occur.
One approach to implementing business policies has been to monitor and control computer systems used to facilitate a business' activities. For example, information regarding various activities, such as sales and payroll, are often stored in one or more data stores. This information may be analyzed to find activity that might be in violation of a business policy, such as an item on an invoice or paycheck to an employee being outside of a specified range, or a particular employee attempting to access information to which he or she is not entitled access.
One approach to monitoring and controlling a business' computer systems involves storing business data in one or more ontologies. With the advent of semantic technologies, the importance of ontologies and semantic query languages has grown manifold. An ontology is a formal representation of knowledge, specifically a formal representation of a set of concepts within a domain and relationships between the concepts. Ontologies are used in several different areas including business analytics, enterprise systems, artificial intelligence and the like, and have the potential to be used in several other fields and applications.
An ontology is typically encoded using an ontology language. Several ontology languages are available. The OWL Web Ontology Language has become the industry standard for representing web ontologies. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between the terms. OWL thus provides facilities for expressing meaning and semantics that goes far beyond the capabilities of languages such as XML. OWL is being developed by the Web Ontology Working Group as part of the W3C Semantic Web Activity. Further information related to OWL may be found at the W3C website.
An ontology may be persisted in memory such as in a database or other data store. There are several standard ways of querying and manipulating the data in an ontology using query languages such as RDQL (Resource Description Format Query Language), OWL-QL, SPARQL (SPARQL Protocol and RDF Query Language), and others. Among the ontology query languages, SPARQL is considered by many to be the de facto industry standard.
In the context of a business, data from one or more data stores may be periodically compiled in an ontology, such as in a batch process occurring at times when there is less use of the business' computer systems, such as at night when most employees are not present. Once compiled in an ontology, an ontology reasoner can be used to make logical inferences from the ontology. Generally, an ontology reasoner is a module implemented in hardware or software executed by a processor, configured to infer logical consequences from a set of facts or axioms stored in an ontology. Ontology reasoners are also known as reasoning engines, rules engines, or simply reasoners. Examples include RETE-based rules engines such as JESS, a rules engine for the Java platform, available for download at http://www.jessrules.com, and probabilistic description logic reasoners such as Pronto, available from Clark & Parsia LLC, located at 926 N St. NW Rear, Studio #1, Washington, D.C. 20001. Ontology reasoners typically incorporate algorithms that, if possible, avoid analyzing complete data sets, and instead analyze only relevant data.
Often businesses have large data sets stored in their data stores and, as a result, ontologies formed from the large data sets are themselves complex, often too complex for a single processor implementing a reasoner to handle efficiently. Attempts to address this issue often suffer a disadvantage in that any change in the ontology's data causes the data to become stale or triggers reasoners to go through the entire data set, thereby negating any advantage gained.