The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming. Information about RDF including “Resource Description Framework (RDF) Model and Syntax Specification found at (www.w3.org/TR/1999/REC-rdf-syntax-19990222); “Resource Description Framework (RDF) Schema Specification at (www.w3.org/TR/1999/PR-rdf-schema-19990303); and “RDF/XML Syntax Specification (Revised) at (www.w3.org/TR/rdf-syntax-grammar) all of which are incorporated herein by reference.
“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”—Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001. More information about the semantic web can be found on the World Wide Web in the W3C Technology and Society Domain document “Semantic Web” at (www.w3.or/2001/sw) incorporated herein by reference.
Preferably a central RDF metadata store is employed if the metadata generated by agents 312 must be persistently stored. In an example embodiment, HEWLETT-PACKARD DEVELOPMENT COMPANY's JENA framework is used as such a store. JENA is available for download on the World Wide Web at (www.hp1.hp.com/semweb/jena.htm).
Jena is a Java framework for writing Semantic Web applications. As from version 2.0 it has its own web site with all of the details and documentation on line:
Jena Overview:
Jena is a Java framework for writing Semantic Web applications. It features:                An RDF API        statement centric methods for manipulating an RDF model as a set of RDF triples        resource centric methods for manipulating an RDF model as a set of resources with properties        cascading method calls for more convenient programming        built in support for RDF containers—bag, alt and seq        enhanced resources—the application can extend the behavior of resources        integrated parsers and writers for RDF/XML (ARP), N3 and N-TRIPLES        support for typed literals        ARP—Jena's RDF/XML Parser—ARP aims to be fully compliant with the latest decisions of the RDF Core Work Group. The Jena 2.0 version is compliant with the Editor's Working Drafts at time of release. ARP is typically invoked using Jena's read operations, but can also be used standalone.Persistence:        
The Jena2 persistence subsystem implements an extension to the Jena Model class that provides persistence for models through use of a back-end database engine. Jena2 is largely backwards-compatible for Jena1 applications with the exception of some database configuration options. The default Jena2 database layout uses a denormalized schema in which literals and resource URIs are stored directly in statement tables. This differs from Jena1 in which all literals and resources were stored in common tables that were referenced by statements. Thus, the Jena2 layout enables faster insertion and retrieval but uses more storage than Jena1. Configuration options are available that give Jena2 users some control over the degree of denormalization in order to reduce storage consumption.
The persistence subsystem supports a Fastpath capability for RDQL queries that dynamically generates SQL queries to perform as much of the RDQL query as possible within an SQL database engine. Currently, Jena2 can use three SQL database engines, MySQL, Oracle and PostgreSQL. These are supported on Linux and WindowsXP. As with Jena1, the persistence subsystem is designed to be portable to other SQL database engines.
Reasoning Subsystem:
The Jena2 reasoner subsystem includes a generic rule based inference engine together with configured rule sets for RDFS and for the OWL/Lite subset of OWL Full. These reasoners can be used to construct inference models which show the RDF statements entailed by the data being reasoned over. The subsystem is designed to be extensible so that it should be possible to plug a range of external reasoners into Jena, though worked examples of doing so are left to a future release.
Of these components, the underlying rule engine and the RDFS configuration should be reasonably stable. The OWL configuration is preliminary and still under development.
Ontology Subsystem:
The Jena2 ontology API is intended to support programmers who are working with ontology data based on RDF. Specifically, this means support for OWL, DAML+OIL and RDFS. A set of Java abstractions extend the generic RDF Resource and Property classes to model more directly the class and property expressions found in ontologies using the above languages, and the relationships between these classes and properties. The ontology API works closely with the reasoning subsystem derive additional information that can be inferred from a particular ontology source. Given that ontologists typically modularise ontologies into individual, re-usable components, and publish these on the web, the Jena2 ontology subsystem also includes a document manager that assists with process of managing imported ontology documents.
RDQL Query Language:
RDQL is a query language for RDF data. The implementation in Jena is coupled to relational database storage so that optimized query is performed over data held in a Jena relational persistent store.
The above definition provides a basic foundation for inventions relating to the Semantic Web, but further technical refinement and additional definitions are needed to describe this invention. In the context of the Semantic Web, a page is any document or data item which contains links to other documents or data. Specifically, pages are not restricted to HTML documents which is the typical page in the World Wide Web. The links between pages are usually, but not always, defined in RDF. Furthermore, these links are semantic relationships in that they have a specific meaning or type. For example, “Author of” is such a relationship that may be used to link the page of an author to the page contain some publication. The Semantic Web also supports additional semantic metadata about pages. For example, a certain field in a page such as “Copyright Date” might itself be a standard way of indicating a copyright date instead of just a field labeled “Copyright Date.”
This invention solves two problems. The first is how to extract knowledge from email messages with the purpose of streamlining workflow. Today, email is heavily used in the everyday workflow of organizations. Several special kinds of email have been used to speed up workflow. For example, special calendaring email clients will automatically negotiate for free time between meeting participants because the mails are in a particular format. This invention generalizes this idea to allow knowledge extraction from any email based on known terms and relationships.
The second problem is that of how to populate the semantic web with valuable pages and links. Semantic information such as fields with actual known meaning must be filled out. This is an extra step in the process of authoring web content that must be undertaken in order for the Semantic Web to succeed. By providing an automated approach to providing this metadata through email, this invention helps break down a large barrier to Semantic Web adoption.