The Internet comprises mainly web servers and clients. A web server is a computer that delivers (serves up) hyper text mark-up language (HTML) pages. Every web server has an internet protocol (IP) address and possibly a domain name. If a client requests a resource via a uniform resource locator/identifier (URL/URI) a request is sent to a server corresponding to the URL/URI.
A web document is a representation of information in a description language like HTML, or XHTML, that is dedicated to be transferable (according to a certain protocol) from a server to a client and that (the information) could be rendered, i.e. presented, at the client.
Hypertext transfer protocol (HTTP) is the underlying protocol used by the server and the client. HTTP defines how messages are formatted and transmitted, and what actions a web server and client should take in response to various commands. For example, when requesting a URL in at a client, this actually sends an HTTP command to the web server directing it to fetch and transmit the requested page.
The other main standard that controls how the world wide web works is hyper text markup language (HTML), which governs how web pages are formatted and displayed.
HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it. This is the main reason that it is difficult to implement web sites that react intelligently.
HTML and HTTP are specified by the world wide web consortium.
The current set of common methods for HTTP is GET, HEAD, POST, PUT, DELETE, TRACE and CONNECT. The GET method retrieves whatever information (in the form of an entity) is identified by the Request-URI. If the Request-URI refers to a data-producing process, it is the produced data which shall be returned as the entity in the response and not the source text of the process, unless that text happens to be the output of the process.
The semantics of the GET method can change to a “conditional GET” if the request message includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field. A conditional GET method requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client.
The semantics of the GET method change to a “partial GET” if the request message includes a range header field. A partial GET requests that only part of the entity be transferred. The partial GET method is intended to reduce unnecessary network usage by allowing partially-retrieved entities to be completed without transferring data already held by the client.
All HTTP entities are represented in HTTP messages as sequences of bytes and the concept of a byte range is meaningful for any HTTP entity used for partial GET method. A byte range operation may specify a single range of bytes, or a set of ranges within a single entity.
In order to realize (dynamic) reactive or interactive web sites other technologies have to be involved. The form technique for client server interaction is for example enhanced by a simple object application protocol integration in Xforms. There are certain web server enhancements like application servers that generate dynamical web pages or web servers that are enabled to execute servelets, e.g. Tomcat, active server pages, or script languages like Java script or PHP.
Current internet transfer protocols are neither suited to exchange nor to interact on a semantical (content based) level. They mainly support navigation by hyper-linking and transferring information based on address, like URIs, i.e. a referential interaction.
The process of retrieval is usually separated into a page providing means like forms to formulate a request that is forcing a query in some foreign presentation, e.g. a database, generating from the query result a new page and delivering the new page to the requesting party.
Current developments like Xcerpt, which is a declarative, rule-based query and transformation language for XML, inspired by logic programming, fail to embed or integrate similarly presented information. (Instead of the path-based navigational approach taken by languages like XSLT and XQuery, Xcerpt uses pattern-based, positional queries, where a pattern is an “example” of the database containing variables for binding content.)
When seeking for special content search engines have to aggregate published content to make them indexed accessible, e.g. via pattern matching search or e.g. a Boolean query on a database. Internet search engines usually are powerful clusters of computers with huge databases storing indexed web content, continuously scanning and/or referring links per keyword or search expression.
The problem to be targeted by this invention is to enhance the retrieval capabilities of a network like the Internet and to enhance cohesion of relating web documents by replacing the straightforward (referential) approach that is currently followed (i.e. instead of providing a manifold of search engines assembling summary information, providing automatically summarizing self-organizing web documents).
Currently it is not possible to ask a server whether the server provides a page satisfying a certain property. Such a property might be for instance a unification condition.
Unification informally is a generalization of pattern matching that is the logic programming equivalent of instantiation in logic. When two terms, e.g. web documents, are to be unified, they are compared with each other in order to identify information for adapting them to be similar. Variable parts, e.g. a place holder or a wild card, are bound or instantiated that the both documents are similar. This concept could for instance be realized with Robinson's algorithm. The result of unification is either failure or success with a set of variable bindings, known as a “unifier”. There may be many such unifiers for any pair of terms. An overview about the theory of unification is provided by Baader and Snyder in their Chapter on Unification Theory in “The Handbook of Automated Reasoning”, Elsevier Science Publishers, 1999.
The retrieval capabilities are enhanced, i.e. the problem is solved by a method for retrieving a web document provided by a web server, where the method comprising the following steps                requesting a first web document from the web server        requesting a content analysis of a second web document according to an inspection requirement that is comprised by the first web document,        analyzing the content of the second web document according to the inspection requirements,        integrating an analyzing result into the first web document and finally        replying this first web document.        
When using unification within the analyzing the resulting web document might contain variables. Furthermore it should be noted that the information exchange is no more directed as in the former client server method, i.e. the bindings could be propagated from requesting site to serving site and vice versa.
The problem is solved inter alia by a web document description language comprising expression means for publishing information for distribution and for retrieving interactive information, further comprising expression means for actively requesting a web document analysis according to inspection requirements and expression means for integrating an analyzing result.
And the problem is solved by a web document transfer protocol comprising the steps of                transmitting a web document request from a client to a server and        replying according to the web document request,where said web document request comprises inspection requirements on a web document that is triggering an analysis at the server, and that the result of the analysis is replied.        
The problem is solved inter alia by a web server comprising receiving means for receiving a web document request, retrieval means for retrieving a web document, and replying means for replying the web document, the web server further comprising inspection means for analyzing the content of the web document according to inspection requirements, and the replying means being adapted to reply an analyzing result.
In other words: Add an application programming interface for analyses with respect to syntactic terms to web documents, where the web documents comprise placeholders (variables), suited to embed analysis results. Preferably this analysis is unification based. That could be a syntactic unification of web documents with variables, a matching mechanism, i.e. a restrictive unification, or even a unification modulo a certain theory, a so called semantic unification, a unification with or without constraints, an analysis with respect to any congruence relation instead of equality, etc.
Formally a unification problem is abstractly treated as a set of relating (hypermedia) objects O1, . . . , On comprising variables X1, . . . , Xm with respect to relations R1, . . . Rk, e.g.O1(X1,X2) R1 O2(X2,X3)O2(X2,X3) R2 O3(X4)O1(X1,X2) R1 O3(X4)
A solution is a variable binding i.e. a mapping X1->S1, X2->S2, X3->S3, and X4->S4 such thatO1(S1,S2) R1 O2(X2,X3)O2(S2,S3) R2 O3(S4)O1(S1,S2) R1 O3(S4)applies. In words that the placeholders are substituted (instantiated) by substitutes (real objects) that fulfill the constraint, i.e. such that the objects are in relation.