The Semantic Web is a vision of the next generation World-wide Web in which data is described with rich semantics thereby enabling software agents to ‘understand’ the data and perform complex tasks on behalf of humans. To achieve this vision, languages have been developed for specifying the meaning of concepts, relating them with custom ontologies for different domains and reasoning about the concepts. The most well-known languages are Resource Description Format (RDF) and RDF Schema (RDFS)which together provide an unique format for the description and exchange of the semantics of Web content. To realize the full potential of the Semantic Web, effective techniques for information retrieval need to be developed.
RDF provides a simple data model for describing relationships between resources in terms of named properties and their values. It describes a Semantic Web using RDF Statements which are triples of the form (Subject, Property, Object). Subjects are resources which are uniquely identified by a Uniform Resource Identifier (URI). Objects can be resources or literals. Properties are first class objects in the model that define binary relations between two resources or between a resource and a literal.
RDF Schema (RDFS) makes the model more powerful by allowing new resources to be specializations of already defined resources. RDFS Classes are resources denoting a set of resources, by means of the property RDF:type (instances have property RDF:type valued by the class). All resources have by definition the property RDF:type valued by RDF:Resource. Moreover, all properties have RDF:type valued by RDF:Property and classes are of the type RDFS:Class.
Two important properties defined in RDFS are subClass of and subPropertyOf. Two other important concepts are domain and range which apply to properties and must be valued by classes. They restrict the set of resources that may have a given property (the property's domain) and the set of valid values for a property (its range). A property may have as many values for domain as needed, but no more than one value for range. For a triple to be valid, the type of the object must be the range class and the type of the subject must be one of the domain classes. RDFS allows inference of new triples based on several simple rules.
The development of effective information retrieval techniques for the Semantic Web has become an important research problem. One approach is query languages that use a SQL-like declarative syntax to query a Semantic Web as a set of RDF triples. Inference is incorporated as part of query answering. However, these languages are not able to determine complex relationships between two resources.
To address this, Anyanwu and Sheth proposed rho-queries for determining the Semantic Association among the Semantic Web resources [K. Anyanwu and A. Sheth, “rho-Queries: Enabling Querying for Semantic Associations on the Semantic Web”; Proceedings of the Twelfth International World-Wide Web Conference, May 2003]. However, no effective implementation has been proposed.
A technique of finding the important pages in a WWW collection has been developed by Kleinberg [J. M. Kelinberg, “Authorative Sources in a Hyperlinked Environment”, Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, May 1988] who defined two types of scores for Web pages which pertain to a certain topic: authority and hub scores. Documents with high Authority scores are authorities on a topic and therefore have many links pointing to them. On the other hand, documents with high hub scores are resource lists—they do not directly contain information about the topic, but rather point to many authoritative sites. Transitively, a document that points to many good authorities is an even better hub, and similarly a document pointed to by many good hubs is an even better authority.