The exemplary embodiment relates to data management, particularly to a more efficient knowledge base system in managing temporal aspects of knowledge. Moreover, the exemplary embodiment may be incorporated to any other existing or future knowledge base systems.
Databases are large repositories that store data and use languages such as SQL to search for and retrieve stored data. Similarly, a knowledge base stores knowledge and provides languages such as SPARQL to retrieve the stored knowledge, along with OWL to infer implied knowledge from the stored knowledge. Relational, object relational or NoSQL DBMS or native data stores are used in storing the knowledge.
Two key aspects of temporal knowledge are valid time, and transaction time. Valid time is one aspect that shows when a fact (knowledge) is valid (true). In the fact “Tom was 6f tall at 1/1/1990” the valid time is 1/1/1990. While valid time shows when a fact is true, transaction time denotes the time a fact (knowledge) is recorded in the knowledge base. If the previous fact, “Tom was 6f tall at 1/1/1990” is recorded at 2/1/1990 the transaction time is 2/1/1990. A temporal knowledge may include either time or both. A knowledge base that includes only valid time is called Valid Time Knowledge Base (VTKB); while a knowledge base that includes only transaction time is called a Transaction Time Knowledge Base (TTKB). A Bitemporal Knowledge Base (BiKB) includes both valid and transaction times. Naturally other aspects of time can be defined.
Resource Definition Framework (RDF) and Resource Definition Framework Schema (RDFS) are the basis of current knowledge bases. OWL is also based on RDF. In these models the basic unit of knowledge is a triple (subject, predicate, object). A subject may be a concept (entity) and similarly an object may be another concept (entity) and the predicate represents a relationship between the object and the predicate. So, a triple is an assertion. Here are two sample assertions, a1 and a2. a1: <Tom isTall 6f> or a2: <Tom enrolledIn Database> where Tom is an entity; Database is another entity; 6f is a literal; and isTall and enrolledIn are predicates.
There are large knowledge bases, such as Yago, and DBpedia that contain millions of entities and billions of assertions. They store knowledge that has been automatically extracted from web based resources, such as WordNet and Wikipedia. On the other hand, Freebase relies on manually supplied knowledge. Such resources can help in many knowledge-related tasks such as answering questions by retrieving existing facts, or inferring implied facts.
A knowledge Base (KB) contains static knowledge and/or temporal knowledge. Static knowledge does not change over time, while temporal knowledge changes over time, and has multiple versions. Note that data and information are also primitive forms of knowledge.
Obviously assertions have a time reference and facts are valid at some point in time, perhaps currently. Consider assertions a1 and a2; typically, they are assumed to be true currently and an implicit (assumed) reference to their time is made. However, facts are true at a time instant or over a period of time. For representing the fact ‘Tom was 6f tall at 1/1/1980’ or ‘Tom enrolled in database class at 2/1/2000’ a typical RDF triple would not be sufficient since these sentences contain more than one predicate. In RDF a method called reification is used to represent assertions that involve more relationships. In reification, an RDF triple, such as (Tom wasTall 6f) is assigned a name, a1 and this name a1 is used in other RDF triples to specify the time. So five more triples are formed: (a1 type sentence), (a1 subject Tom), (a1 predicate wasTall), (a1 object 6f), (a1 hasTime 1/1/1980). So is the case for the temporal version of the assertion a2. FIG. 3A depicts the RDF graph for reified a1 and FIG. 3B lists the RDF triples required for representing that a1 was valid at 1/1//1980.
For representing the fact ‘Tom was 6f tall from 1/1/1980 to 1/1/1990’ or ‘Tom enrolled in a database class from 2/1/2000 to 6/1/2000’, a typical RDF triple would not be sufficient since these sentences contain more than one predicate. Again reification is needed. For the first assertion nine triples will be defined as seen in FIGS. 4A and 4B since a period has a beginning instant and an ending instant. The same is true for the second assertion above.
Obviously handling temporal knowledge with reification is a very expensive method. Any temporal fact would incur a fivefold or nine fold triple overhead. Because of this reason in systems like Yago and DBpedia shortcuts are used. Yago assign an id to each triple, and then adds two more triples with time predicates. In addition to a1, it includes (a1 occursSince 1/1/1980), (a1 occursUntil 1/1/1990). Naturally this type of reification is not complete and cannot completely capture temporal reality.
A system and a method are provided which can significantly improve the performance of knowledge bases in managing temporal facts in a VTKB or TTKB without using reification.