With the advent of the World Wide Web (hereinafter Web) and graphics-based Web browsers, the Web has grown exponentially to provide an information exchange of unprecedented proportion. The Web is an Internet facility that links documents both locally and remotely. A Web document, or Web page, is accessed and read via a Web browser. In the last half of the 1990s, the Web became the focus of Internet activity because Web pages containing both text and graphics were easily accessible via a Web browser. Today, those Web pages can also utilize new browser features and plug-in extensions that allow for audio, video, telephony, 3-D animations, and videoconferencing.
Hypertext Markup Language, or “HTML,” is the coding behind standard Web pages. Referring now to FIG. 2A, one of the key features of HTML is the ability to render a Web page 200 composed of separate resources such as images 205, sound files, cascading style sheets, and ActiveX objects, in-lined with the marked up text 210. Referring now to FIG. 2B, the separate resources that make up the Web page 200 are typically stored in a multiple related-file storage format 215. In other words, a single Web page 200 containing text 210, sound files, and images 205 is stored as multiple related-files comprising separate files for each sound file, image, and text. For example, the main document, or Web page HTML source 220, may be stored as “Front_Page.htm” file 225. The Web page HTML source 220 may contain “links” or “pointers” to each individual sound file, image, text, etc. For example, link 230 may point to the star.gif file 235 stored in folder 240 and link 245 may point to text_box.txt file 250 stored in folder 240.
Storing a Web page in HTML format is unwieldy because it requires the storage of separate files for each resource. These separate files can be hard to manage and maintain. For example, Web site administrators or individuals may want to delete, copy, or move files around but may not know the name, location, or number of files referenced by the main HTML file. Moreover, they may rename the main HTML file but be unaware of the necessity for renaming the other supporting files or vice versa. Users have grown accustomed to having a single file per document and therefore generally have trouble managing all these files.
As may be understood from the description above, a typical Web page consists of a main HTML source file and a host of resource files, such as graphics files, sound files, etc. Often, resource files are maintained within a folder structure and the main HTML document includes links to the locations of the resource files within that folder structure. Because any given resource file may be in a folder that is different than the folder containing the main HTML document, the links in the HTML document will not be accurate unless the resource files are maintained in the folder structure.
Modern Internet users desire to integrate the components of a Web page into a single file. Such a file is easier to manage because it can be saved in a single location, can be viewed offline, and can be sent as a single attachment via e-mail. Unfortunately, the structure of an HTML Web page and its components is not conducive to such integration.
Various approaches exist for putting Web pages into a single file. One of these approaches involves storing all the different parts of a Web page inside a self-extracting executable (“.exe”) file. Initiating this executable file causes the different files of the Web page to be written to a temporary location and opened into a main page. Documents in executable file format, however, suffer from several drawbacks. First, they tend to be fairly large because they require additional code within the executable file. Second, users are often wary of opening executable files because there is a risk that the executable files may contain a hidden computer virus. In fact, some companies automatically remove any attached executable files from e-mail received over the Internet for fear of viruses. Additionally, not only are executable files incapable of being natively displayed in a Web browser, they are also not directly editable by any Web page authoring application.
Another approach is Hewlett-Packard's “PRINTSMART” application which allows a user to define a list of Web pages and “bundle” them together into a single reference file for printing. However, the single reference file does not actually include the resource files of the Web pages. In other words, if this single reference file is mailed to another user, they would not be able to view the resources of the Web page unless they could link to the locations of the resource files.
Previous versions of Microsoft's “INTERNET EXPLORER” Web browser included a “Save as Web Archive” feature. A user may navigate to a Web page, choose the “Save as” command and choose “Web archive” as a file format. However, this feature had several drawbacks. First, the Web page needed to be loaded into the “INTERNET EXPLORER” Web browser before it could be saved. Second, this feature did not save all of the resource files associated with the Web page such that the files could be returned to their original locations with respect to the main HTML document upon opening. For example, this feature did not capture all the slides in a slideshow presentation saved as HTML, just the first slide.
Microsoft's “INTERNET EXPLORER” Web browser also includes a “Send Page” feature. A user may load a Web page and choose “File”, “Send”, and “Page as E-mail.” This feature creates a new mail message with the contents of the Web browser as the contents of the message. This suffers from the limitations described for the “Save as Web Archive” feature as well as additional limitations. Framesets and script aren't supported in the body of an e-mail message. Moreover, a MAPI compliant mail client that understands HTML mail is required to view the e-mail message.
Therefore, there is a need for a process for packing a Web page into a single file, so that the Web page's resource file structure is maintained and the Web page can be displayed in its original form. There is also a need for a process that packs all of the Web page content so that the unpacked Web page may be immediately viewable without an expensive extraction process.