1. Field of the Invention
The present invention relates to data processing, and more particularly, to a system and method for delivery of documents over a computer network.
2. Background of the Invention
The Internet is a well known global computer network through which information can be exchanged between two parties. The World Wide Web (WWW) is probably the most utilized component of the Internet. Each day, millions of users access the WWW to obtain information that can comprise one or more of text, images, sound, and other data. The information that can be exchanged over the Internet may be referred herein interchangeably as document. The information or document is typically stored in a database or memory that is associated with a server computer. The information can be presented as a web page. The web page is then made available on a web site of the server computer. Users can view the web page using a web browser associated with a client computer. Well known web browsers include the Navigator of Netscape Communications, Inc. and the Internet Explorer of Microsoft Corporation.
The dominant information that is being transferred over the Internet is the Hypertext Markup Language (HTML) document. Since the introduction of the first widely adopted HTML browser, Mosaic, the need for speeding-up the deliver of HTML documents over the Internet has been a prevalent problem. Unlike other computer hardware components, the Internet connectivity has been very slow in improvement. For example, since the early 1995, CPU computing power has increased by a factor of 20×, disk capacity has increased by a factor of 100×, and memory size has grown by a factor of 4×. Sadly, however, average Internet connection speed has increased by a factor of only 2×. This imbalance has caused great stress in the kind and manner in which content is delivered and viewed over the Internet.
In response to this problem, content developers have adopted a model of creating minimized content. Yahoo.com is a premier adopter of this model, where the HTML page size is kept to a small size and virtually all graphics are missing. The result is that the general audience, who lacks high-speed broadband Internet connectivity, are forced to either view uninteresting content or experience unbearable download wait periods.
With the scaling of the Internet users, many companies have recently begun to offer solutions to scaling or speeding-up the access to the Web information. For example, caching solutions were offered to scale the back-end HTML web servers, so that they can meet the millions of hits per day. More recently, edge network caching solutions offered ways to not only speed access to content, but also provide rapid scaling and redundancy. Today, the industry has recognized that the majority of Internet users are mostly limited by the “last mile” connection, namely their modem connection to an ISP (Internet Service Provider). Using novel page anticipation and data reduction techniques such as compression and delta updating, emerging solutions are making progress in minimizing the amount of information that needs to be passed from an origin web server to the end user. However, all existing solutions to the “last mile” problem do not scale, as the amount of information grows or the network connection slows down.
Web pages are typically formatted using the well known HTML. Other formats can include XML, PDF, Flash, MS WORD, and Postscripts. A typical web page has about 40 to 240 kilobytes (KB) of information, which can take about eight to 48 seconds to download by “thick” clients having network connection speed of 56 to 128 kilobytes per seconds (kbps). In addition, thick clients usually has large screens like VGA or SVGA with screens of 1024 by 768 pixels. As a result, most web pages are not accessible for viewing by “thin” clients with lower bandwidths (9.6 kbps to 14.4 kbps network connection speed) and smaller screen sizes (160×160 to 320×240). Examples of thin clients include wireless devices such as personal digital assistants (PDAs) and pocket PCs. Even if these thin client computers can access the web pages, it often takes an impracticably long time to download. For example, at network connection speed of 14.4 kbps, it could take 240 seconds to download a web page, which is roughly about five times slower than the speed at which a thick client can download the same amount of information.
To accommodate thin clients, a number of partial solutions have been implemented. For example, in addition to their regular websites designed for thick clients, some content providers create and maintain a second website specifically for thin clients. The design and implementation of the second website can be very expensive and labor intensive. More importantly, the second website is often a “watered down” version of the first website. In other words, the second website does not contain as much information as the first website. The second website tend to contain mostly, if not exclusively, textual content. As a result, users accessing the second website are often dissatisfied with the content they receive. This in turn has deterred users from using thin clients from accessing the WWW for information. Ultimately, wireless devices are basically limited to getting from the WWW text-based messaging, e-mails, and web clippings.
The inability to efficiently deliver multimedia web pages to wireless devices has led to development of new infrastructure. The new infrastructure, either in hardware or software, or both, can be very expensive.
In response to this problem, an industry consortium, called the WAP Forum, produced an entirely new set of networking protocols called the Wireless Application Protocol (WAP), and an associated new web page language called the Wireless Markup Language (WML). WAP is an application environment and set of communication protocols for wireless devices designed to enable manufacturer-, vendor-, and technology-independent access to the Internet and advanced wireless telephony services. WAP provides wireless Internet access through digital cellular networks, giving network users a menu driven method for downloading information, such as flight schedules and bank account balances, to wireless devices from the Internet.
A WAP device, such as a wireless telephone, will be configured to communicate with the WWW only through the intermediary of a particular WAP gateway. These gateways accept WAP communications from WAP devices, transmit requests to WML servers using standard HTTP protocol, compile the retrieved data into WML bytecode, and transmit this back to the devices. That is, WAP devices never see the raw WML; they deal only with WML bytecode. This approach leads to bandwidth savings, since the WML bytecode is substantially more compact than the raw WML. It also allows for a less complex browser, since the wireless device need only deal with data that has been compiled, and hence which may be presumed syntactically correct. To provide web pages to WAP devices under this scheme, content providers must maintain separate WML versions of their web pages, and configure their web-server software to serve them with the correct MIME type.
Stated in another way, WAP's basic strategy is to shrink a web page in a static way so that the information on the web page can be viewed by the micro-browser of a WAP device. For example, when a web page is accessed by a WAP device, images on the web page are either taken out completely or are shrunk to a very small image. Text on the web page is either lined up in a very tall column or in a very long row for viewing by the micro-browser. As a result, WAP does not provide a complete solution for thin client's access to the WWW. Furthermore, WAP involves manual process, templates, preprocessing, etc., which can be expensive and labor intensive. More importantly, users of WAP devices do not get the web page that was originally designed.
In spite of its industry support, the WAP/WML approach suffers from all of the usual difficulties associated with the imposition of new protocols that depart from tried-and-tested software engineering practices. For example, some server-side difficulties include:
(1) The server is required to create WML versions of their HTML pages, thus substantially increasing the labor and expense of maintaining web pages.
(2) WML allows only a very small set of layout formats, making it difficult for servers to customize content, and increasing the likelihood that vendors will introduce their own extensions to the standard to fix the problem.
(3) Unlike HTML, WML has a very unforgiving syntax. Even the slightest syntax error will prevent the WAP gateway from compiling the WML into bytecode. Moreover, since this compilation occurs on the WAP gateway, the server receives no feedback about such errors. It is possible for the server to use XML validators to check their WML (since WML is a subset of XML), but this solution is less than ideal when the WML needs to be generated dynamically—from a database, for example. Content providers are often forced to resort to the extreme measure of checking their pages'appearance on each member of the millions of wireless devices, to be sure that they render as intended.
(4) Most egregiously, each WAP device has its own page-length limit on the amount of WML that it may receive. Worse, this limit is imposed not on the WML source, but rather on the compiled WML bytecode. For example, the Nokia 7110 imposes a limit of 1397 compiled bytes of WML.
(5) The protocol introduces an entirely new monochrome image format called WBMP (Wireless BitMaP) that is unsupported by major graphics software packages.
(6) The protocol does not support cookies, making it impossible for the server to access persistent (longer than one session) information on the client.
(7) There is still some resistance in the U.S. market to adopting the WAP/WML standards. Sprint PCS and many other U.S. carriers have a partnership with Phone.com to provide a WAP gateway and a browser for their customers. While Phone.com is a member of the WAP Forum, they have also encouraged developers to use their own proprietary language called Handheld Device Markup Language (HDML). Indeed, not all of Phone.com's products are completely WAP compliant.
In addition, there are client-side difficulties which include:
(8) Web pages viewed by clients are restricted to only a few limited formats (see disadvantage number (2) above), and monochrome images (see disadvantage number (5) above). The client's web-surfing experience is thus dramatically curtailed. At present, most users restrict their use of this medium to electronic mail and other text-based processing, for precisely this reason. In short, clients simply do not see the real WWW.
(9) The lack of support for cookies means that clients must retype they user name and password each time they open a new session with a secure web page (see disadvantage number (6) above). This is particularly unfortunate, given the relative difficulty of entering text on certain small wireless devices.
Another partial solution to provide web pages to thin clients involves the compression and caching of information on the web pages. The rationale behind this solution is that faster access can be had if enough information is compressed and cached. The reality, however, is that not all information on the WWW is compressible or cacheable. Furthermore, some information even if compressible and cacheable, can not be reduced to a satisfactory ratio. As a result, compression and caching of the information provide inconsistent results. Another disadvantage is that sometimes the compression modules have to be integrated into the documents. Still another disadvantage is that caching has to be integrated within a proxy server, a network server, routers, and other components.