1. Field of the invention
This invention relates to data and metadata management over the internet for navigating, querying and manipulating any kind of information by using and executing high level specifications in Resource Description Framework and by supporting multiple object relational database resources over the web. With the advancement of World Wide Web, a large number of different types of objects (text, file, audio, video, image as well as relational data) are being created everyday. Internet can be visualized as a large single database. Querying and manipulating such a large database from many different perspectives is a nontrivial task. Additionally, transactions over the web, electronic commerce with complex buyer/seller relationships and distributed many tier application architecture are also posing demand for new technology solutions. This invention relates to those specific technology needs (a) by incorporating advanced metadata specifications in extensible markup language for implicit generation of object SQL queries in conjunction with navigational capabilities and (b) by incorporating need-based persistent connectivity through object brokers to support transactions over database web entities.
2. Description of the Prior Art
Internet is becoming an important channel for retail commerce as well as business to business transactions. The number of web buyers, sellers and transactions is growing at a rapid pace. But the potential for the internet for truly transforming commerce and business still remains to be fully realized. Electronic purchases are still largely non-automated. Software techniques are required to automate several of the most time consuming stages of web surfing and buying/selling processes. Additionally, business to business web transactions are demanding seamless query facilities over all kinds of information at the front end web portal sites as well as at the back end relational databases in a connected enterprise. Uniform querying, decision support and transactional characteristics need to be present over any kind of web data despite of the fact that data may or may not be immediately present in a single relational database. So far transactional and query capabilities are limited to data residing inside a relational database whereas text and multimedia data residing at a web site are only viewed by the use of Hyper Text Markup Language (HTML). The World Wide Web was originally built for human consumption, and although everything on it is machine-readable, everything is not machine-understandable. It is hard to automate anything on the web, and because of the volume of information the web contains, it is not possible to manage it manually.
W3C (web address http://www.w3c.org) is an international industry consortium to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability. The solutions so far proposed by W3C in Extensible Markup Language (XML) and Resource Description Framework (RDF) incorporate metadata to describe the data contained on the web. Metadata is xe2x80x9cdata about dataxe2x80x9d or specifically xe2x80x9cdata describing Web resourcesxe2x80x9d in the context of the World Wide Web. The distinction between xe2x80x9cdataxe2x80x9d and xe2x80x9cmetadataxe2x80x9d is not an absolute one and it is a distinction created primarily by an application. Programs and autonomous agents can gain knowledge about data from metadata specifications.
The RDF model draws well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. The basic model consists of three object types,:
(1) Resources: All things being described by RDF expressions are called resources. A resource may be an entire Web page; for example the HTML document http://www.w3.org/Overview.html. A resource may be a part of a Web page; e.g. a specific HTML or XML element within a document source. A resource may also be a whole collection of pages; e.g. an entire Web site. Resources are identified by universal resource identifiers or URIs. Anything can have URI; the extensibility of URIs allows the introduction of identifiers for any imaginable entity.
(2) Properties: A property is a specific aspect, characteristic, attribute or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties.
(3) Statements: A specific resource together with a named property plus the value of that property for that resource is a RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate and the object. The object of a statement (i.e. the property value) can be another resource or it can be a literal, i.e. a resource (specified by a URI) or a simple string or other primitive data type defined by XML.
A simple example statement xe2x80x9cJohn Doe is the creator of the resource http://www.w3.org/home/Johnxe2x80x9d has the Subject (resource) http://www.w3.org/home/John, Predicate (property) as xe2x80x9cCreatorxe2x80x9d and Object (literal) as xe2x80x9cJohn Doexe2x80x9d. Meaning in RDF is expressed through reference to a schema. A schema is a place where definitions and restrictions of usage for properties are documented. In order to avoid conflicts in definitions of the same term, RDF uses the XML namespace facility where a specific use of a word is tied to the dictionary (schema) where the definition exists. Each predicate used in a statement must be identified with exactly one namespace, or schema. RDF model also allows qualified property value where the object of the original statement is the structured value and the qualifiers are further properties of a common resource. To represent a collection of resources, RDF uses an additional resource that identifies the specific collection. This resource should be declared to be an instance of one of the container object types, namely,
(1) Bag (an unordered list of resources or literals),
(2) Sequence (an ordered list of resources or literals) and
(3) Alternative (a list of resources or literals that represent alternatives for the single value of a property).
A common use of containers is the value of a property. When used in this way, the statement still has a single statement object regardless of the number of members in the container; the container resource itself is the object of the statement.
Use of metadata was so far popular in relational databases to describe attributes, number and types of columns in tables, foreign-key/primary-key relationships, views etc. in a relational schema. SQL (Structured Query Language) queries made against a relational schema are resolved by fetching metadata from data dictionaries (or repository for metadata definitions) to interpret data fetched from data files during execution of a relational operation. Query executions are independent of any application domain specific features. In a similar manner, Resource Description Framework (RDF) is a foundation for representing and processing metadata and data for the World Wide Web; it provides interoperability between applications that exchange machine-understandable information on the web. The broad goal of RDF is to define a mechanism for describing resources that makes no assumptions about a particular application domain, nor defines the semantics of any application domain. RDF relies on the support of XML (extensible markup language) and its model resembles an entity-relationship diagram. In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables. To facilitate the definition of metadata, RDF represents a class system much like object-oriented programming and modeling systems. A collection of classes is called a schema. Schemas may themselves be written in RDF.
Representation of xe2x80x9cdata about dataxe2x80x9d (metadata) to achieve application independent interoperable solutions carries the basic similarity between relational databases and RDF. However, RDF does not carry facilities for specifying queries making use of metadata, so far possible in a relational database. Query capabilities enable users to construct arbitrary types of data on the fly for application processing logic to apply. Additionally, relational databases are having advanced capabilities in universal servers to specify application interfaces embedded inside SQL query expressions to represent operations or methods to apply over constructed data. Such important possibilities are also missing from RDF. Lack of such facilities is the limitation of RDF to address evolving electronic business needs in its completeness.
Relational algebra incorporates algebraic operations like join, select, project, union, intersection etc. Such operations are expressed in queries against a relational schema. As opposed to this scenario, web entities are accessed by navigation through Uniform Resource Identifiers (URIs). An amalgamation of these two paradigms is the desired goal to achieve in electronic business. Relational operations over RDF definitions for resources and their attributes are possible exploiting relationships over resources and structured property values and normalizing them in a back end relational database. Queries involving join, select, project and other relational operations can be effectively used to extract desired values and properties of resources. Without such a mechanism, web surfing in conjunction with complex automated business to business services and transactions are not possible.
Electronic commerce and services have introduced many new ways of trading allowing interaction between groups that previously could not economically afford to trade among one another. Whereas previously commercial data interchange involved mainly the movement of data fields from one computer to another, the new model for web-based commerce and services is typically dependent on intelligent processing and interactions for the transactions to take place. This involves understanding and specifying business concepts represented in the interchanged data and subsequent application of business-specific rules or methods to the interchanged data. Transactional and query facilities with embedded method interfaces can lead to such a powerful scheme. Query specifications with embedded interfaces are currently present in object relational databases or universal servers. Object relational databases with business logic bound inside the server offer distinct directions for resolving similar complex issues over XML/RDF definitions and Java classes. Transactional and query facilities to an object relational database are possible through thin client windows incorporating a persistent connectivity with the database. Persistent connectivity to a database system is not possible in a simple browser for stateless web navigation.
XML/RDF documents are interchanged based upon HTTP (Hypertext transfer protocol) which is different from IIOP (Internet inter ORB protocol). HTTP is the main communication mechanism among web browsers and servers. It is a stateless protocol implying that there is no way for the client and the server to know each others state. Since web is stateless, each transaction consumes time and resources in the setup and closing of network and database connections. For large transaction processing applications, this overhead will be significant. Internet inter ORB protocol (IIOP) is a dynamic invocation interface for the web. This protocol maintains the session between the client and the server objects until either side disconnects. It provides persistent connectivity over the web for distributed objects. The OMG (Object Management Group with web address http://www.omg.org) is an industry consortium to create a component based software marketplace by establishing industry guidelines and detailed object management specifications to provide a common framework for application development. Common Object Request Broker Architecture (CORBA) from OMG specifies the Object Request Broker (ORB) that allows applications and programs to communicate with one another no matter where they reside on the web. The IIOP specification defines a set of data formatting rules, called CDR (Common Data Representation) which is tailored to the data types supported in the CORBA interface definition language (IDL). Electronic business transactions and query servers implementing structured query language (SQL) processing engine require internet protocols for document transfers as well as object executions with persistent connectivity over the web. As a result, such an engine must build on top of both HTTP and IIOP. Traditional browsers for navigation need to be augmented with additional capabilities for occasional creation, maintenance and destruction of one or more client windows interfacing databases over the web for transactions and collaborations. These windows require IIOP for persistent connectivity.
A database schema can be partitioned over the web in such a way that disparate business logic and business objects can exist with relevant data and views over the web. Unifying the object paradigm and relational model paradigm is the mainstream effort across the industry. Unified model for distributed relational databases integrated with object model is the key to many storage and manipulation issues for the electronic business. Universal relational database servers are available from different database vendors to offer general extensibility and features for electronic business. One can extend types of attributes in tables and integrate routines defined by users written in high level programming languages. Such products offer the facilities of user-defined routines and packages. A user-defined routine (UDR) is a routine that a user creates and registers in the system catalog tables and that is invoked within a SQL statement or another routine. A function is a routine that optionally accepts a set of arguments and returns a set of values. A function can be used in SQL expressions. A procedure is a routine that optionally accepts a set of arguments and does not return any values. A procedure cannot be used in SQL expressions because it does not return a value. An UDR can be either a function or a procedure. The ability to integrate userdefined routines, packages and functions within SQL is the extensibility feature offered by universal servers and such features are useful for electronic business.
Uniform Resource Identifiers are frequently embedded in XML and HTML pages where a browser can navigate through a resource identifier to find and manipulate web objects. A resource can also identify an object relational schema component over the web. RDF documents represent metadata that could be directly derived from one or more object relational database(s). Information existing inside object relational databases presented in XML/RDF definitions makes an information hierarchy over the web that should be seamlessly navigated and queried. This kind of seamless interoperability can prove to be very valuable in electronic business and commerce. However, these possibilities are not present in current state of the art.
As described above, there is a clear need in the art for automated web functionality in electronic business over the information stored and exchanged across the internet, requiring (a) to support a generic execution model for arbitrary type construction from XML/RDF definitions followed by business logic execution anywhere on the web, (b) to support uniform object SQL query facilities over a single virtual database unifying multiple object relational databases viewed by related XML/RDF documents and (c) to support uniform navigational as well as transactional facilities over data and metadata definitions extending capabilities in traditional browsers for occasional creation and maintenance of thin client windows for persistent connectivity with remote databases. There are further needs in XML/RDF processing framework to unify and support various techniques for buyer/seller relationships in electronic commerce, (a) by allowing navigation through web pages (XML/RDF definitions) for inspecting and implicitly generating object SQL queries, (b) by executing such queries in one or more object relational engines over the internet to generate further web pages (XML/RDF definitions) with more accurate or elaborate information, (c) by seamlessly repeating such processes described in (a) and (b) if necessary and (d) by performing transactions on focused data items anywhere on the web.
The present invention solves the aforementioned deficiencies of the prior art and solves other problems that will be understood and appreciated by those skilled in the art upon reading and understanding the present specification. It is a primary objective of the present invention to provide a mechanism in Extensible Markup Language (XML) and Resource Description Framework (RDF) for representing and navigating higher level specifications for data/metadata and for constructing arbitrary types by SQL queries with embedded method interfaces. Specifications for SQL queries with embedded business application logic could be either represented in XML/RDF documents or such queries for transactions could be triggered through thin client windows communicating persistently with remote databases. These object SQL queries apply uniformly to one or more object relational databases over the web to manipulate data and to construct further XML/RDF documents for navigation and inspection. This way, a uniform paradigm for navigation through Uniform Resource Identifiers within XML/RDF documents and querying one or more object relational schema components identified seamlessly by Uniform Resource Identifiers over the web will evolve, thereby addressing various needs in electronic business and commerce.
In one embodiment of the invention, a virtual unified Database over multiple object relational databases over the web is described. Application business logic, messaging services and object request brokers reside inside an object relational database server eliminating the need for a middle tier. Uniform Resource Identifiers are stored in table columns of relational databases as locators of elements inside component relational schema with Java classes distributed over the net. Java classes encapsulate or package business logic to be applied on relational data or other multimedia data. Types defined by users to represent attributes in tables involve packaged object definitions to encapsulate methods or operations. These operations are applied through Object Request Brokers. In this invention, user-defined package contains interface definitions (call specifications) where an explicit mapping of data item types to tables and attributes are made to ensure safe environment of method application. Such definitions of interfaces inside packages are used to apply methods over records constructed by relational operations and these interfaces are embedded within object SQL queries. These interfaces are implemented as methods in Java classes with appropriate mappings to Java argument types. Therefor a uniform paradigm for multi-tier client/server without a middle tier application server is presented.
In another embodiment, XML/RDF definitions to represent data and metadata are mapped to relational descriptions for records and their relationships through foreign keys and primary keys. Uniform Resource Identifiers are unique identifiers for entities over the web and in this invention these identifiers are used as primary key and foreign key values in a relational model of description. A relationship expressed in RDF description of resources can be directly mapped to the primary key/foreign key relationship of records in a relational database. A collection of resources is defined by RDF container types; namely Bag, Sequence and Alternative. These container types can be normalized into proper foreign key and primary key relationships so that SQL queries involving join and other operators may be able to construct records with defined container relationships. Multi-part primary keys for sequences and different normal forms in relational model capture various semantics possible in RDF data/metadata definitions. Thus multiple related XML/RDF documents distributed over the web are mapped to data and metadata in multiple related object relational databases for object SQL queries to fetch and manipulate information.
In another embodiment, method interfaces are represented as virtual attributes with specifications for parameters as resources and collection of resources. Definitions for methods are given in XML namespace schema definition. This invention further extends the scope of representing SQL queries in XML/RDF specifications by defining XML namespace for relational operator definitions. All the key words in SQL are defined in namespace with respective meanings. SQL queries are implicitly constructed from sets of XML/RDF definitions in semantically related documents. This invention further represents object SQL queries by having method interfaces embedded within queries in XML/RDF format. This way a specific XML/RDF document can be carrying an object SQL query for execution within one or more object relational schema components over the web.
In yet another embodiment, XML/RDF documents with data/metadata definitions constitute views over databases. A part of a database or union over several databases can be viewed by a document. All database updates are viewed by generating new documents replacing old ones. A view or a union over views or a full database can be updated by creating client windows maintaining persistent links through object brokers. Such windows are maintained as long as transactions persist. Transactions over views or a database cannot be done without such a client window. Only read-only implicit queries are allowed without such client windows. Object SQL queries uniformly manipulate local and/or remote database table, attributes and objects. Querying and viewing multiple object relational schema components over the web can also include legacy and other large existing database systems.