The Semantic Web is considered as an extension of the World Wide Web having information linked up in such a way as to be easily processable by machines. The Semantic Web is generally built on syntaxes which use URIs (Universal Resource Identifier) to refer data, usually in triple based structures, like RDF (Resource Description Framework). Information about the data, called metadata, is something that benefits especially from reasoning. Inferred metadata often reduces required human interaction and can also help with interoperability issues faced when combining or utilizing different data sources at the same time.
In the Semantic Web, rules about the data are often expressed as ontologies that describe the characteristics of the data by giving a shared vocabulary to express knowledge. Such ontology languages are for example Web Ontology Language (OWL) and RDF Schema (RDFS).
The delete operation is often seen as not being part of the Semantic Web systems, mostly because of the open world subsumption where everything is expected to be true unless stated otherwise. The subsumption makes deleting of a fact pointless, as the operation will not make the fact false. In real application use cases in most application domains, the delete operation can be seen as “undoing the insertion of the fact”. With this definition, the delete operation must also “undo” the inference triggered by the inserted fact. This kind of behavior is needed, for example, when correcting erroneous data or when the data has changed over time.
An Expert System
An Expert System is considered to be a computer program structure that, when executed, uses symbolic knowledge and inference to reach conclusions (Mercadal, 1990, ss. 96-97). It derives most of its power from its knowledge. The key components of an expert system are an inference engine and a knowledge base. The separation of control (the inference engine) from knowledge (knowledge base) is a hallmark of an expert system. Other components of an expert system include a user interface, a knowledge-acquisition module, and an explanatory interface. Data tuples and horn clause rules as such are used in known expert systems.
Forward Chaining
The forward chaining inference is one of the two main methods of reasoning when using inference rules as such. The opposite of forward chaining is backward chaining. Forward chaining reasoning starts from the available tuples and the rules (the antecedent) and produces more data tuples (the consequent). Forward chaining suits well in reasoning systems where large number of data queries are made concurrently and query performance is important as all inferencing takes place together with data modification operations.
In forward chaining inference the most common strategy is to use total materialization where all implicit tuples are inferred into and from the tuple space during data modification operations.
One commonly used inference algorithm is the Rete algorithm (Forgy, 1979) that provides a generalized logical description of an implementation of functionality responsible for matching data tuples against rules by creating and maintaining a network of nodes consisting of rules and facts. As it is suitable for relatively small datasets with highly computational rules, it scales poorly when the data size increases. Also the working memory requirements are high as the whole Rete network must be kept in memory for maximum performance.
Decision tree learning, used in statistics, data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target value. In these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Similarly in reasoning, the decision tree branches are combinations of implicit or explicit tuples with a rule that produces implicit tuples the leaf nodes present. Use of decision trees leads to a computational complexity that can lead into performance and scalability issues with larger data sets and with complex rules. Also memory requirements are high as the decision tree must be stored and kept accessible all the time.
Known Methods for Delete
The most common approach for implementing a delete operation in the Semantic Web is to remove inferred tuples and re-infer all inferred tuples again. This approach may be valid in systems where data changes rarely and modifications to the data are mainly done in a batch operation, for example once a day. The time spent for re-inference can be lowered by partitioning the data and re-inferencing only the corresponding partitions, but it still may take a long time preventing read operations to occur simultaneously. For operational data that changes constantly this approach is intolerable as it requires exclusive write access to the tuple space and thus restricts concurrent queries and may take a long time to finish.