Resource Description Framework (RDF) is the de-facto standard for data representation on the World Wide Web. The amount of RDF data from disparate domains grows rapidly. For instance, the Linked Open Data (LOD) initiative integrates billions of entities from hundreds of sources. Just one of these sources, the DBpcdia dataset, describes more than 3.64 million things using more than 1 billion RDF triples, of which 385 million are extracted from the English edition of Wikipedia.
Keyword searching is used to explore and search large data corpuses whose structure is either unknown or constantly changing and has already been studied in the context of World Wide Web data, graphs, relational databases and XML documents. More recent efforts considered applying keyword searching over RDF data; however, the solutions considered by these efforts have serious limitations. Most notably, these previous attempts suffer from either false positives, i.e., the keyword search returns answers that do not correspond to real subgraphs from the underlying RDF data or false negatives, i.e., the search misses valid matches from the RDF data. A severe limitation of existing techniques is the inability to scale to handle typical RDF datasets with tens of millions of triples. When presented with such workloads, existing techniques often return empty results for meaningful keyword queries that do have matches from the underlying RDF data.