There is currently a tremendous growth in highly connected data, which includes social network data (such as Twitter, Facebook, etc.), biological networks, scientific data, sensor network data, etc. Such connected data is often stored on traditional RDBMS/relational data systems.
The relational data may be organized into many parent-child or hierarchical relationships. For example, a data row in an employee table may map to a row in a department table. A row for an order may map to an order line, which in turn maps to a part which maps to a supplier which maps to a country. In traditional systems, in order to provide a visualization of the data, a query is written in order to perform a join of the data to be reported and displayed.
Thus, knowledge of the data schema is typically required. For example, if a user wants to see which employees are in which departments, they can write a database query to retrieve the requested information. The query cannot be written without knowledge of the underlying data schema (e.g., the relationship between the employee table and the departments table). However, in many instances a user may not be familiar with the underlying data schema, and may wish to be able to explore connections between data in the schema without having any prior knowledge.
RDF is a widely-used language that was originally developed for representing information (metadata) about resources in the World Wide Web. It may, however, be used for representing information about absolutely anything. When information has been specified using the generic RDF format, it may be consumed automatically by a diverse set of applications.
There are two standard vocabularies defined on RDF: RDF Schema (RDFS) and the Web Ontology Language (OWL). These vocabularies introduce RDF terms that have special semantics in those vocabularies. For simplicity, in the rest of the document, our use of the term RDF will also implicitly include RDFS and OWL. For more information and for a specification of RDF, see RDF Vocabulary Description Language 1.0: RDF Schema, available at www.w3.org/TR/rdf-schema/, OWL Web Ontology Language Overview, available at www.w3.org/TR/owl-features/, and Frank Manola and Eric Miller, RDF Primer, published by W3C and available in September, 2004 at www.w3.org/TR/rdf-primer/. The RDF Vocabulary Description Language 1.0: RDF Schema, OWL Web Ontology Language Overview, and RDF Primer are hereby incorporated by reference into the present patent application.
Facts in RDF are represented by RDF triples. Each RDF triple represents a fact and is made up of three parts, a subject, a predicate (sometimes termed a property), and an object. For example, the fact represented by the English sentence “John is 24 years old” can be represented in RDF by the subject, predicate, object triple <‘John’, ‘age’, ‘24’>, with ‘John’ being the subject, ‘age’ being the predicate, and ‘24’ being the object. In the following discussion, the values in RDF triples are termed lexical values. For the purposes of this specification, an RDF triple may be expressed in the form of subject→#property→object (e.g., the triple <‘John’, ‘age’, ‘24’> may be expressed as John→#age→24).
A key challenge when visualizing relational data using RDF is how to make efficient use of screen space during visualization. In general relational data will grow by an order of magnitude when transformed into RDF data, as each cell of a table will be transformed into an RDF triple. For example, assuming no null values, a single table with 20 rows and 10 columns will be translated into 200 RDF triples.
In addition, many large scale RDF visualization tools require a materialized lexical-values-to-ID mapping table and a materialized ID-based RDF graph, and may rely on pre-computed ID-based summaries for speeding up visualization of RDF views over relational data. On the other hand, an RDF view arrived at by applying standard methods of RDF mapping (e.g., W3C RDB2RDF direct mapping) will only have lexical values, and no IDs, due to the data being directly generated at query time via mapping from the underlying relational store.
Another problem with visualizing RDF views over relational data using standard methods of direct mapping is that they are partitioning agnostic. When there is large scale relational data which has been partitioned by a database designer, it would be desirable to be able to take advantage of those partitioning schemes in visualizing the data, which with the current specification of W3C RDB2RDF direct mapping is not possible.
Therefore, there is a need for visualization techniques for relational data as a graph that facilitates exploration and discovery, yet can be produced with an interactive response time.