The Semantic Web is a collaborative movement led by the international standards body, the World Wide Web Consortium (W3C). The Semantic Web provides a common framework that enables data to be shared and reused across application, enterprise, and community boundaries. The term “Semantic Web” was coined by Tim Berners-Lee, for a web of data that can be processed by machines. Publishing data into the Semantic Web involves using languages specifically designed for the data. One example is the Resource Description Framework, or RDF. Another language, which builds on RDF, is the Web Ontology Language (OWL).
At the core of the concept of publishing data onto the Semantic Web is the ability to “assert” knowledge, without having to build rules or structure around how the knowledge is used, or will be used. Accordingly, unlike a traditional relational database, it is possible to collect data into the Semantic Web, without knowing exactly what the structure or use of the data might eventually be. With a relational database, it is necessary to predefine the schema, or way in which the data connects to itself, prior to inserting any information into the system. Additionally, modifying the structure of a relational database is difficult at best, and if modified, the modification is a global change to the schema. For this reason, it is not possible to have multiple root schemas. Furthermore, it is nearly impossible to query across relational databases where there is an overlap of information, since a relational database query assumes that it can only query against the data available in a single database system.
A Semantic Web system does not constrain queries or data to exist in a single system, and there are standard ways of mapping data from one system to another. Additionally, RDF or OWL can be used to define “how things work” or “the relational structure” of the data. These definitions can be changed at any time, because they are not required while collecting or storing the data. As a result, it is possible to have many versions of the definitions, where each definition is optimized for a specific task. A simple definition version might be used for a simple query, thus providing a sparse schema, while a complex OWL version can be used to provide much more complex inferences and semantic language connections.
However, with the many benefits of the Semantic Web come many challenges. In 2006, Tim Berner Lee and colleagues stated that, “This simple idea . . . remains largely unrealized.” Other types of database systems, such as document data stores, are growing at a faster rate than the Semantic Web. For example, MongoDB is a document data store, and its popularity is growing rapidly. These alternative data systems have “loose” schemas, meaning that they can change over time. Yet, the documents themselves tend to have requirements that end up not changing over time. These data systems have their benefits over a traditional relational database, but they do not address the same requirements as the Semantic Web.
Semantics, is a Greek word, which is defined as the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for, or denote. The Semantic Web enables RDF and OWL, for example, to help define the meaning of data that is represented within the system. However, as with all things that contain fuzzy logic, definitions and other inferences, the ontology describing even a small system can become quite complex. In fact, the more logic the system contains, the more complex it becomes, possibly infinitely complex, since the connections with one element might span multiple ontological domains. A simple example will help to illustrate this point. Suppose that the data in question pertains to a pizza. A relational database might include a table containing pizza names, and another table containing toppings. Perhaps each topping would have columns indicating whether the topping is a meat product or a vegetable product. A third table might include a unique pizza identification (ID), e.g., an ID for Pepperoni, and one row for each topping on that type of pizza. However, in the Semantic Web world, the OWL description looks like a node graph and the relationship are defined a bit differently. FIG. 1 illustrates a visual graph 100 of an exemplary “pizza” ontology.
It is important to note that the ontology shown in FIG. 1 is completely separate from the actual data about the various types of pizza that might be considered. Data is “asserted” into the ontology database, but does not need the entire ontology. The data that is asserted only requires what is considered to represent the types of nodes and their unique identifiers. Data comes in the form of N-Triples, each including three elements that are always the same, as shown below:
“Pepperoni Pizza” “is-a” “Pizza”
The first component (i.e., “Pepperoni Pizza”) is the subject, the second (i.e., “is-a”) is the predicate, and the third component (i.e., “Pizza”) is the object. Although some semantic systems extend this concept, all semantic systems have Triples that are the basis of all data input. There are many challenges and benefits with regards to how data is represented in this manner. One of the more important benefits is that the data can be separated from the ontology that defines the “meaning” of the data. To query whether a “Pepperoni Pizza” has “Meat,” a user might employ a query language such as SPARQL, which can be loaded with the query and ontology at the same time. With this approach, it is possible to modify the ontology for every query, or globally, to make the ontology more intelligent about, for example, what makes a “Vegetarian Pizza.” Although there may not be an “assert” stating that “Pepperoni Pizza” “is-a” “Vegetarian Pizza,” the ontology can be used to define that “all pizza's without meat are vegetarian,” thus returning results to a query seeking to identify all vegetarian pizzas.
The challenge, however, is how to enable a more logical way of using the ontology. As ontologies grow more complex, they become less human readable, and less effective as a classification system. People and programs tend to deal with things that are classified into various groupings, since this is an easier way of breaking down problems. One term used for this type of concept is “Taxonomy.” The general term, Taxonomy, is the practice and science (study) of the classification of things or concepts, including the principles that underlie such a classification. An exemplary pizza taxonomy, for example, might look like the following:                Food Group                    Pizza                            With Meat                                    Pepperoni                    Hawaiian                                                Without Meat                                    Cheese                                                                                
Such a taxonomy is useful because it essentially walks through the process of selecting the final context. A person does not need to understand all of the nodes on the Taxonomy, since it is a directed graph. The person can simply start at the top and work their way down the graphical hierarchy. However, this approach makes it difficult to include properties, such as “Extra Cheese,” since it would be necessary to add that property as an option under every pizza, or alternatively, to create another Taxonomy such as “Topping Selections.”
It would be desirable to address the shortcomings of using an ontology or taxonomy by using the best of both of these systems. Such an approach should provide a way to present data in an organized way, either to a human or to a computing device software algorithm, yet should be able to produce a set of N-Triples describing the work performed semantically.