The present invention relates generally to Web pages and, more particularly, to displaying Web pages.
The Internet is a worldwide decentralized network of computers having the ability to communicate with each other. The Internet has gained broad recognition as a viable medium for communicating and interacting across multiple networks. The World-Wide Web (Web) was created in the early 1990""s, and is comprised of server-hosting computers (Web servers) connected to the Internet that have hypertext documents (referred to as Web pages) stored therewithin. Web pages are accessible by client programs (e.g., Web browsers) utilizing the Hypertext Transfer Protocol (HTTP) via a Transmission Control Protocol/Internet Protocol (TCP/IP) connection between a client-hosting device and a server-hosting device. While HTTP and hypertext documents are the prevalent forms for the Web, the Web itself refers to a wide range of protocols including Secure Hypertext Transfer Protocol (HTTPS), File Transfer Protocol (FTP), and Gopher, and content formats including plain text, Extensible Markup Language (XML), as well as image formats such as Graphics Interchange Format (GIF) and Joint Photographic Experts Group (JPEG).
An intranet is a private computer network conventionally contained within an enterprise and that conventionally includes one or more servers in communication with multiple user computers. An intranet may be comprised of interlinked local area networks and may also use leased-lines in a wide-area network. An intranet may or may not include connections to the outside Internet. Intranets conventionally utilize various Internet protocols and, in general, often look like private versions of the Internet. An intranet user conventionally accesses an intranet server via a Web browser running locally on his/her computer.
Exemplary Web browsers for both Internet and intranet use include Netscape Navigator(copyright) (Netscape Communications Corporation, Mountain View, Calif.) and Internet Explorers(copyright) (Microsoft Corporation, Redmond, Wash.). Web browsers typically provide a graphical user interface for retrieving and viewing Web pages, applications, and other resources hosted by Internet/intranet servers (hereinafter collectively referred to as xe2x80x9cWeb serversxe2x80x9d).
As is known to those skilled in this art, a Web page is conventionally formatted via a standard page description language such as HyperText Markup Language (HTML), which typically contains text and can reference graphics, sound, animation, and video data. HTML provides for basic document formatting and allows a Web content provider to specify anchors or hypertext links (typically manifested as highlighted text) to other Web servers and files. When a user selects a particular hypertext link, a Web browser reads and interprets an address, called a Uniform Resource Locator (URL) associated with the link, connects the Web browser with a Web server at that address, and makes a request (e.g., an HTTP request) for the file identified in the link. The Web server then sends the requested file to the Web client which the Web browser interprets and displays to the user.
With the increasing mobility of today""s society, the demand for mobile computing capabilities has also increased. Many workers and professionals are downsizing their laptop computers to smaller palm-top or hand-held devices, such as personal digital assistants (PDAs). In addition, many people are utilizing cellular telephones to access the Internet, to access intranets, and to perform various other computing functions. Computing devices including, but not limited to, PDAs, cellular telephones, and computing devices utilized within appliances and automobiles, are often collectively referred to as xe2x80x9cpervasivexe2x80x9d computing devices. Many pervasive computing devices utilize the Microsoft(copyright) Windows CE and 3Com Palm Computing(copyright) platforms.
Unfortunately, pervasive computing devices typically have displays that are small in size compared with desktop computer displays. As a result, content portions of a Web page, such as images and text that are otherwise displayable on a desktop computer display, may not be displayable on a pervasive computing device display unless some modifications to the images and/or text (i.e., the content) are made. For example, a desktop computer display having an array of 1024 pixels by 768 pixels may be able to display a large (e.g., 2 megabit), 24 bit per pixel color image. A pervasive computing device with a display having an array of 120 pixels by 120 pixels, and with the ability to display only about 3 bits per pixel, may ignore much of the image data. As a result the image may not be displayed properly, if at all, via the pervasive computing device display unless the size of the image is reduced.
Text font and size within a Web page may also need to be changed to permit the display thereof within a pervasive computing device display. In addition, performance limitations of pervasive computing devices, such as memory size and connection bandwidth, may also require changes to Web page content for proper display thereof via a pervasive computing device.
Accordingly, it is desirable to have techniques that permit Web page content to be modified and presented in custom-tailored formats for various types of pervasive computing devices. As described above, this may include removing or shrinking of images. This may also include the creation of summary pages of headings, or in some cases, conversion of HTML to dialects such as Compressed Markup Language (CML) or Wireless Markup Language (WML).
A growing number of Web pages are being written in the Extensible Markup Language (XML). For example, dynamically generated Web pages, which intermix data retrieved at run-time with static page layout commands, are often generated using XML. Various XML tools have also been developed to perform xe2x80x9ccontent tailoringxe2x80x9d of Web pages to enable display thereof via pervasive computing devices. These content tailoring tools can work well with Web pages in XML format. Unfortunately, these content tailoring tools may not work well with some Web pages in HTML format. Content tailoring tools typically expect a well-formed, regular document that follows XML rules (i.e., with all start tags matched by end tags, all parameters in a standard format, etc.). HTML documents often break many of these rules, and may have many irregularities of format that are not specifically allowed by HTML, but are tolerated by browsers.
Because many in the business community see XML as a standard format for data transfer, XML Web pages are becoming more prevalent. However, it is considered unlikely that either existing Web pages written in HTML or popular HTML authoring tools will be ported to XML at the same rate that new XML Web pages are being created. Accordingly, there is expected to be a transition period wherein Web pages may contain a mixture of HTML-formatted content and XML-formatted content.
Accordingly, there is a need for content tailoring tools that can be used with Web pages written in XML and HTML, as well as Web pages written in a mixture of XML and HTML. Furthermore, some existing Web browsers may not be able to properly display Web pages written in both XML and HTML. Accordingly, there is currently a need for modifying Web pages having a mixture of XML and HTML formats so as to be displayable within current Web browsers.
In view of the above discussion, it is an object of the present invention to facilitate content tailoring of Web pages written in HTML using XML-based content tailoring tools.
It is another object of the present invention to facilitate the display of Web pages written in HTML and XML within current Web browsers.
It is another object of the present invention to facilitate the display of Web pages via pervasive computing devices that may have smaller displays and various performance limitations as compared with desktop computing devices.
These and other objects of the present invention are provided by systems, methods and computer program products for utilizing XML-based tools to tailor HTML-based Web page content for display within various client devices. According to the present invention, a client device, such as a pervasive computing device, requests a Web page that contains one or more portions that require tailoring for display within the requesting client device. These portions, which are typically in HTML format, but can be in other formats, are converted to an XML format. Other portions of the Web page are masked so as to be xe2x80x9chiddenxe2x80x9d and are, thus, not converted to XML format.
The portions of the Web page converted to an XML format are then modified, using an XML content-tailoring tool, so that the content can be properly displayed within the requesting client device. The modified XML portions are then converted back to HTML format or another originating format of the Web page. The masked portions of the Web page are then unmasked and the Web page with modified content is transmitted to the client device for display therewithin.
The present invention can facilitate tailoring of Web page content in HTML format utilizing more sophisticated XML content tailoring tools. The present invention is advantageous because HTML irregularities can be compensated for and masked out. In essence, the present invention facilitates placing an HTML document into a regular format that can be processed by normal XML tools. Accordingly, the present invention can facilitate bridging the gap between the growing world of XML format content and the relatively mature collection of Web pages in HTML format. The present invention can also allow Web pages having a mixture of HTML and XML formats to be converted to a single format for display within a Web browser.